zoekt/build/ at 5e2620e0cf642a18f3f85da79122b7d1e20e6193 · boltless.me/zoekt · Tangled

boltless.me / zoekt

0

fork of https://github.com/sourcegraph/zoekt

0

zoekt / build /

at 5e2620e0cf642a18f3f85da79122b7d1e20e6193 1 folder 7 files

Julie Tibshirani Indexing: properly block on shard building (#689) 2y ago

Add benchmark for ctags conversion (#679) This change adds a benchmark for the conversion from ctags output to Zoekt document data, plus a tiny optimization to presize the symbol slices.

2 years ago

Indexing: properly block on shard building (#689) When indexing, we build shards in parallel based on the `parallelism` flag. Each shard handles ~100MB of document contents, which should limit the memory usage to roughly `100MB * parallelism`. Looking at the size of the buffered document contents in memory profiles, we see much higher usage than this. The issue seems to be that we continue to buffer up documents even if all threads are busy building shards. This can be a real problem if shards take a super long time to build (say because ctags is slow) -- we could end up buffering a ton of content into memory at once. This change fixes the throttling logic so we block indexing when all threads are busy building shards.

2 years ago

builder_test.go

Indexing: respect indexing buffer limit (#686) When indexing documents, we buffer up documents until we reach the shard size limit (100MB), then flush the shard. If we decide to skip a document because it's a binary file, then (naturally) we don't count its content size towards the shard limit. But we still buffered the full document. So if there are a large number of binary files, we could easily blow past the 100MB limit and run into memory issues. This change simply clears `Content` whenever `SkipReason` is set. The invariant: a buffered document should only ever have `SkipReason` or `Content`, not both.

2 years ago

builder_unix.go

Swap out all usages of the `syscall` package (#513) with the `golang.org/x/sys/unix` package. `syscall` has been frozen since Go 1.3 and deprecated (https://go.dev/doc/go1.4#major_library_changes). Using the `golang.org/x/sys/unix` package will bring in bug fixes and enhancements since `syscall` was frozen in 1.3, and will pave the way for multi-platform builds (which will affect only the single-program local install, most likely).

3 years ago

Indexing: improve skipped doc handling (#687) This change makes a couple small improvements to how we handle skipped docs: * Immediately skip ctags parsing if the content is `nil` * Always sort skipped docs to the end of the shard. This seems like a nice invariant. And generally it's good for performance to group data that is expected to be accessed together and has similar content.

2 years ago

build: faster newLinesIndices via bytes.IndexByte and buffer re-use (#680) Firstly we use bytes.IndexByte for faster newLinesIndices. On my machine this reduces wall clock time of BenchmarkTagsToSections by 38%. This is faster since bytes.IndexByte relies on CPU specific optimizations to find the next new line (eg uses AVX2 if available). Secondly we reuse nls slice between calls to tagsToSections. I noticed in the profiler a nonsignificant chunk in the garbage collector. The slice built by newLinesIndices is allocated and thrown away for each call to tagsToSections. This means we can re-use it which this commit implements by introducing a struct storing the buffer. We now use this buffer per shard of symbols we analyse. old time/op new time/op delta 188µs ± 7% 101µs ± 3% -46.10% (p=0.000 n=10+10) old alloc/op new alloc/op delta 79.3kB ± 0% 36.3kB ± 0% -54.24% (p=0.000 n=9+10) old allocs/op new allocs/op delta 443 ± 0% 441 ± 0% -0.45% (p=0.000 n=10+10) Test Plan: go test -bench BenchmarkTagsToSections

2 years ago

Indexing: improve skipped doc handling (#687) This change makes a couple small improvements to how we handle skipped docs: * Immediately skip ctags parsing if the content is `nil` * Always sort skipped docs to the end of the shard. This seems like a nice invariant. And generally it's good for performance to group data that is expected to be accessed together and has similar content.

2 years ago

scoring_test.go

Scoring: test against local scip-ctags (#677) This change refactors our end-to-end scoring tests and enables local testing using the scip-ctags binary: * Split scoring tests out of `e2e_test` and into their own file `scoring_test` * Split huge test methods into targeted ones like `TestFileNameMatch`, `TestJava`, `TestGo`, etc. * For languages that scip-ctags supports, rerun the same cases using the scip-ctags binary To run scip-ctags tests locally, you can set the env variable ``` SCIP_CTAGS_COMMAND=<sourcegraph-repo>/dev/scip-ctags-dev ``` This doesn't yet update Zoekt CI to run scip-ctags tests. That will be tackled in a follow-up.

2 years ago