alpha
Login
or
Join now
boltless.me
/
zoekt
Star
0
Fork
0
Atom
Configure Feed
Issues
Pull Requests
Commits
Tags
Feed URL
Select the types of activity you want to include in your feed.
fork of https://github.com/sourcegraph/zoekt
Star
0
Fork
0
Atom
Configure Feed
Issues
Pull Requests
Commits
Tags
Feed URL
Select the types of activity you want to include in your feed.
Overview
Issues
Pulls
Pipelines
zoekt
/
build
/
at
687cafc8f702e6c0efa0b562b35c0eee619a88d8
1 folder
7 files
Julie Tibshirani
Boost symbol matches in BM25 (#876)
2y ago
c03b77fb
testdata
Add benchmark for ctags conversion (#679) This change adds a benchmark for the conversion from ctags output to Zoekt document data, plus a tiny optimization to presize the symbol slices.
2 years ago
builder.go
Boost symbol matches in BM25 (#876) When digging into our Natural Language Search (NLS) eval results, I found that one of the leading causes for flexible search types like "Fuzzy symbol search" and "Find logic" was noisy matches in top results. Currently, our BM25 ranking rewards any substring match equally. So for queries like 'extract tar', any match on 'tar' (even within unrelated terms like 'start', etc.) counts towards the term frequency. This PR helps reduce noise by boosting symbol matches the same as we do filename matches. Our NLS evals show positive improvement, and context evals are the tiniest bit better.
2 years ago
builder_test.go
build: use enry to detect low priority files (#829) This is a much more robust detection mechanism. Additionally we have these signals we can also add in: func IsConfiguration(path string) bool func IsDocumentation(path string) bool func IsDotFile(path string) bool func IsImage(path string) bool My main concern with this change is generated file detection on content using up RAM or CPU. Will monitor this impact on pprof in production. Test Plan: go test.
2 years ago
builder_unix.go
Allow wasm compilation (#786)
2 years ago
ctags.go
all: use stdlib slices package (#735) Noticed we weren't using this yet and that the API signatures had changed. Test Plan: go test
2 years ago
ctags_test.go
Indexing: clean up ctags parser wrapper (#708) This change cleans up the Go ctags parser wrapper as a follow-up to #702. Specific changes: * Remove synchronization in `lockedParser` and rename it to `CTagsParser` * Push delegation to universal vs. SCIP ctags into parser wrapper * Simplify document timeout logic * Rename some files
2 years ago
e2e_test.go
sourcegraph: multi-tenant Zoekt (#859) This updates webserver and sourcegraph-indexserver to support multi-tenancy. The change is behind an ENV feature-flag. Key changes: - tenant ID is now part of the index (repo metadata) - GRPC: IndexOption and Repository have a new field TenantId - If multi-tenancy is enabled, webserver checks if tenant in context matches the tenant id in the shard - zoekt-git-index has a new parameter "-shard_prefix ". If set, the value will be used instead of repository name as prefix for the name of the shard. For Sourcegraph we use "<tenant id>_<repository id>" as prefix if multi-tenancy is enabled Assumption: All calls to Sourcegraph are privileged Test plan: - New tests - Ran this together with Sourcegraph (with and without MT enabled)
2 years ago
scoring_test.go
Boost symbol matches in BM25 (#876) When digging into our Natural Language Search (NLS) eval results, I found that one of the leading causes for flexible search types like "Fuzzy symbol search" and "Find logic" was noisy matches in top results. Currently, our BM25 ranking rewards any substring match equally. So for queries like 'extract tar', any match on 'tar' (even within unrelated terms like 'start', etc.) counts towards the term frequency. This PR helps reduce noise by boosting symbol matches the same as we do filename matches. Our NLS evals show positive improvement, and context evals are the tiniest bit better.
2 years ago