zoekt/cmd/ at da3626ec7c7cf15d762aa508abefe9896f90b5e0 · boltless.me/zoekt · Tangled

boltless.me / zoekt

0

fork of https://github.com/sourcegraph/zoekt

0

zoekt / cmd /

at da3626ec7c7cf15d762aa508abefe9896f90b5e0 18 folders 1 file

techknowlogick add zoekt-mirror-gitea (#844) 2y ago

matchtree: call prepare on symbolRegexpMatchTree subtree (#685) This was a huge oversight that has lived in our codebase since we introduced symbolRegexpMatchTree. Because we don't call prepare, we don't correctly use the index for symbol regex queries. From some local testing this makes a huge difference to performance. Huge shout-out to @camdencheek who spotted this. Test Plan: validated with some local searches that results remain the same and that the statistics for the searches go up for IndexBytesLoaded, but go down for ContentBytesLoaded, FilesConsidered, FilesLoaded, etc. Added unit tests which assert the index is used. Also perf tested with hyperfine. Hyperfine results: Benchmark 1: ./zoekt-before -sym '^searcher$' Time (mean ± σ): 93.0 ms ± 1.2 ms [User: 142.2 ms, System: 18.9 ms] Range (min … max): 90.8 ms … 95.6 ms 31 runs Benchmark 2: ./zoekt-after -sym '^searcher$' Time (mean ± σ): 52.3 ms ± 0.5 ms [User: 76.3 ms, System: 13.0 ms] Range (min … max): 50.7 ms … 53.4 ms 53 runs Summary './zoekt-after -sym '^searcher$'' ran 1.78 ± 0.03 times faster than './zoekt-before -sym '^searcher$'' For that search, a random comparison of the zoekt stats: | Stat | Before | After | Delta | |---------------------- |---------- |--------- |----------- | | ContentBytesLoaded | 199007382 | 22566033 | -176441349 | | IndexBytesLoaded | 3527 | 165645 | 162118 | | Crashes | 0 | 0 | 0 | | Duration | 57956167 | 17568708 | -40387459 | | FileCount | 28 | 28 | 0 | | ShardFilesConsidered | 0 | 0 | 0 | | FilesConsidered | 28477 | 766 | -27711 | | FilesLoaded | 28477 | 766 | -27711 | | FilesSkipped | 0 | 0 | 0 | | ShardsScanned | 5 | 5 | 0 | | ShardsSkipped | 0 | 0 | 0 | | ShardsSkippedFilter | 0 | 0 | 0 | | MatchCount | 29 | 29 | 0 | | NgramMatches | 87 | 4407 | 4320 | | NgramLookups | 644 | 644 | 0 | | Wait | 5792 | 11500 | 5708 | | MatchTreeConstruction | 498042 | 515248 | 17206 | | MatchTreeSearch | 97661875 | 23089418 | -74572457 | Analysis: An absolutely massive reduction in the number of files we consider. This means we are actually using the index properly. eg look at ContentBytesLoaded, Duration, FilesConsidered, FilesLoaded. You can also see that IndexBytesLoaded has gone up since we now use it properly. This was on a small corpus so will have huge impact in production. Note that the random changes Wait, MatchTreeConstruction are random, but the MatchTreeSearch change is a big deal since that is time spent searching after analysing a query.

2 years ago

zoekt-archive-index

zoekt-archive-index: split out ranking tests and archive indexing (#712) We had ranking e2e tests living in the zoekt-archive-index cmd for convenience since that contained useful functions for indexing a remote tarball from the GitHub API. This commit splits the archive functionality into a new internal/archive package and the ranking tests into a new internal/e2e package. The zoekt-archive-index code is now quite minimal. This is similiar to how zoekt-git-index mostly just calls out to the gitindex package. What is different is that archive package is marked internal, unlike gitindex. gitindex should also be internal, but the code predates go's support for internal. I suspect more of our e2e tests will end up in this package. Test Plan: go test ./...

2 years ago

zoekt-dynamic-indexserver

all: gofumpt -l -w . gofumpt is a stricter gofmt. I took a look at the changes and in general they are nice. I don't think we need to enforce the use of gofumpt, but I like the idea of running it every once in a while. Test Plan: go test ./...

2 years ago

zoekt-git-clone

remove bazel (#634)

2 years ago

zoekt-git-index

Debug: write memory profile if heap exceeds threshold (#819) This PR adds adds a debugging flag to periodically check memory usage against a threshold. If it exceeds the threshold, then a memory profile like `indexmemory.prof.1` is written to disk. No more than 10 profiles will be written. I've already found this more useful than the existing `-memprofile` flag, so I removed that. It's hard to get insights using that flag, since it only takes a single profile per shard, forces GC, and forces parallelism to 1.

2 years ago

zoekt-index: USAGE message if no arguments supplied (#827) Test Plan: go run ./cmd/zoekt-index prints out useful instructions

2 years ago

zoekt-indexserver

feat: GitLab: exclude user repos (#830)

2 years ago

zoekt-merge-index

remove bazel (#634)

2 years ago

zoekt-mirror-bitbucket-server

remove bazel (#634)

2 years ago

zoekt-mirror-gerrit

gomod: go get -u ./... (#820) Test Plan: go test ./...

2 years ago

zoekt-mirror-gitea

add zoekt-mirror-gitea (#844) * add zoekt-mirror-gitea * * Clean up setting the default * update note about topic filtering not being implemented as topics are missing from the API * cleanup some pointers * cleanup some code syntax

2 years ago

zoekt-mirror-github

remove bazel (#634)

2 years ago

zoekt-mirror-gitiles

remove bazel (#634)

2 years ago

zoekt-mirror-gitlab

feat: GitLab: exclude user repos (#830)

2 years ago

zoekt-repo-index

Use single map for collecting files across branches (#839) When looking at large profiles for `inuse_space` on dot com, I noticed the filename maps in `prepareNormalBuild` taking a bunch of memory. This PR avoids allocating a separate map per branch, instead having `RepoWalker` collect all the entries in a single instance variable.

2 years ago

zoekt-sourcegraph-indexserver

sourcegraph: fix wrong git config (#841) Turns out git config doesn't support "_" in the keys. https://git-scm.com/docs/git-config/2.22.0#_configuration_file "The variable names are case-insensitive, allow only alphanumeric characters and -, and must start with an alphabetic character." Test plan: New unit test

2 years ago

all: gofumpt -l -w . gofumpt is a stricter gofmt. I took a look at the changes and in general they are nice. I don't think we need to enforce the use of gofumpt, but I like the idea of running it every once in a while. Test Plan: go test ./...

2 years ago

zoekt-webserver

Report GCP profiles from zoekt-git-index (#816) This PR initializes the GCP profiler in the `zoekt-git-index` process so we can examine CPU and memory usage for the indexing process itself.

2 years ago

add zoekt-mirror-gitea (#844) * add zoekt-mirror-gitea * * Clean up setting the default * update note about topic filtering not being implemented as topics are missing from the API * cleanup some pointers * cleanup some code syntax

2 years ago