fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

Indexing: respect indexing buffer limit (#686)

When indexing documents, we buffer up documents until we reach the shard size
limit (100MB), then flush the shard. If we decide to skip a document because
it's a binary file, then (naturally) we don't count its content size towards
the shard limit. But we still buffered the full document. So if there are a large
number of binary files, we could easily blow past the 100MB limit and run into
memory issues.

This change simply clears `Content` whenever `SkipReason` is set. The
invariant: a buffered document should only ever have `SkipReason` or `Content`,
not both.

+6
+3
build/builder.go
··· 642 642 b.size += len(doc.Name) + len(doc.Content) 643 643 } else { 644 644 b.size += len(doc.Name) + len(doc.SkipReason) 645 + // Drop the content if we are skipping the document. Skipped content is not counted towards the 646 + // shard size limit, so otherwise we might buffer too much data in memory before flushing. 647 + doc.Content = nil 645 648 } 646 649 647 650 if b.size > b.opts.ShardMax {
+3
build/builder_test.go
··· 244 244 if len(b.todo) != 1 || b.todo[0].SkipReason == "" { 245 245 t.Fatalf("document should have been skipped") 246 246 } 247 + if b.todo[0].Content != nil { 248 + t.Fatalf("document content should be empty") 249 + } 247 250 if b.size >= 100 { 248 251 t.Fatalf("content of skipped documents should not count towards shard size thresold") 249 252 }