alpha
Login
or
Join now
boltless.me
/
zoekt
Star
0
Fork
0
Atom
Configure Feed
Issues
Pull Requests
Commits
Tags
Feed URL
Select the types of activity you want to include in your feed.
fork of https://github.com/sourcegraph/zoekt
Star
0
Fork
0
Atom
Configure Feed
Issues
Pull Requests
Commits
Tags
Feed URL
Select the types of activity you want to include in your feed.
Overview
Issues
Pulls
Pipelines
zoekt
/
internal
/
at
388cba654f9ede9d576ba3591f9df10820215df2
15 folders
Stefan Hengl
scoring: remove IDF from BM25 scoring (#912)
1y ago
b437dc7b
archive
scoring: remove IDF from BM25 scoring (#912) We remove IDF from our BM25 scoring, effectively treating it as constant. This is supported by our evaluations which showed that for keyword style queries, IDF can down-weight the score of important keywords too much, leading to a worse ranking. The intuition is that for code search, each keyword is important independently of how frequent it appears in the corpus. Removing IDF allows us to apply BM25 scoring to a wider range of query types. Previously, BM25 was limited to queries with individual terms combined using OR, as IDF was calculated on the fly at query time. Test plan: updated tests
1 year ago
ctags
Move several packages to internal/ (#901) This PR moves the following packages to `internal` to avoid exposing them in the API: * `ctags` * `debugserver` * `gitindex` * `shards` * `trace`
1 year ago
debugserver
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
e2e
scoring: remove IDF from BM25 scoring (#912) We remove IDF from our BM25 scoring, effectively treating it as constant. This is supported by our evaluations which showed that for keyword style queries, IDF can down-weight the score of important keywords too much, leading to a worse ranking. The intuition is that for code search, each keyword is important independently of how frequent it appears in the corpus. Removing IDF allows us to apply BM25 scoring to a wider range of query types. Previously, BM25 was limited to queries with individual terms combined using OR, as IDF was calculated on the fly at query time. Test plan: updated tests
1 year ago
gitindex
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
json
Move several packages to internal/ (#901) This PR moves the following packages to `internal` to avoid exposing them in the API: * `ctags` * `debugserver` * `gitindex` * `shards` * `trace`
1 year ago
languages
feat(Search): Add support for all Apex language extensions (#799) * feat(Search): Add support for all Apex language extensions * clean up comment * Fix typo
2 years ago
mockSearcher
remove bazel (#634)
2 years ago
otlpenv
remove bazel (#634)
2 years ago
profiler
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
shards
Rename IndexBuilder -> ShardBuilder (#908) When navigating the code, I've often forgotten the difference between `NewBuilder` and `NewIndexBuilder`. This rename clarifies that one of these indexes a whole repo, while the other builds individual shards. Also `index.NewShardBuilder` sounds better.
1 year ago
syntaxutil
all: use a faster vendored regexp/syntax/Regexp.String (#753) We replace all calls to Regexp.String with a vendored version which is faster. go1.22 introduced a commit which "minimizes" the string returned by Regexp.String(). Part of what it does is run enumerate through literals runes in your string to see calculate flags related to unicode and case sensitivity. This can be quite slow, but is made worse by the fact we call it per shard per regexp in your query.Q to construct the matchtree. Currently Regexp.String() represents 40% of CPU time on sourcegraph.com. Before go1.22 it was ~0%. Note: This is a temporary change to resolve the issue. I have a deeper change to make this less clumsy. Note: In one place we remove the use of string by relying on Regexp.Equal instead. Test Plan: go test
2 years ago
tenant
Move several packages to internal/ (#901) This PR moves the following packages to `internal` to avoid exposing them in the API: * `ctags` * `debugserver` * `gitindex` * `shards` * `trace`
1 year ago
trace
Move several packages to internal/ (#901) This PR moves the following packages to `internal` to avoid exposing them in the API: * `ctags` * `debugserver` * `gitindex` * `shards` * `trace`
1 year ago
tracer
remove bazel (#634)
2 years ago