alpha
Login
or
Join now
boltless.me
/
zoekt
Star
0
Fork
0
Atom
Configure Feed
Issues
Pull Requests
Commits
Tags
Feed URL
Select the types of activity you want to include in your feed.
fork of https://github.com/sourcegraph/zoekt
Star
0
Fork
0
Atom
Configure Feed
Issues
Pull Requests
Commits
Tags
Feed URL
Select the types of activity you want to include in your feed.
Overview
Issues
Pulls
Pipelines
zoekt
/
index
/
at
fbc543834aa2608b2ed5440853ad9a46b8101391
38 files
Julie Tibshirani
ranking: incorporate file signals into BM25F (#922)
1y ago
fbc54383
bits.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
bits_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
btree.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
btree_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
builder.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
builder_test.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
contentprovider.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
contentprovider_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
ctags.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
ctags_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
eval.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
eval_test.go
Rename IndexBuilder -> ShardBuilder (#908) When navigating the code, I've often forgotten the difference between `NewBuilder` and `NewIndexBuilder`. This rename clarifies that one of these indexes a whole repo, while the other builds individual shards. Also `index.NewShardBuilder` sounds better.
1 year ago
file_category.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
file_category_test.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
hititer.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
hititer_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
index_test.go
sourcegraph-indexserver: add grpc server (#920) Relates to SPLF-874 This adds a grpc server to sourcegraph-indexserver. For now it supports just one method. The diff is quite big, so I left comments to mark the most important bits. I used the opportunity to clean up a bit (=> hence the big diff): - Reuse grpc logic from webserver and move those bits to "/gprc/..." - Move "protos" inside the new "grpc" directory (=> requires changes of import statements in Sourcegraph) - Refactor import aliases for grpc packages across the codebase Test plan: I tested this locally by calling the new grpc endpoint directly.
1 year ago
indexdata.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
indexdata_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
indexfile.go
Fully revert windows support (#909) We added support for Windows in https://github.com/sourcegraph/zoekt/pull/535. We then partially reverted the change, specifically the parts related to mmap (#706). This was okay because we no longer build for Windows. This PR fully removes support to avoid being in a partially-implemented state.
1 year ago
limit.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
limit_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
matchiter.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
matchiter_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
matchtree.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
matchtree_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
merge.go
Rename IndexBuilder -> ShardBuilder (#908) When navigating the code, I've often forgotten the difference between `NewBuilder` and `NewIndexBuilder`. This rename clarifies that one of these indexes a whole repo, while the other builds individual shards. Also `index.NewShardBuilder` sounds better.
1 year ago
merge_test.go
Rename IndexBuilder -> ShardBuilder (#908) When navigating the code, I've often forgotten the difference between `NewBuilder` and `NewIndexBuilder`. This rename clarifies that one of these indexes a whole repo, while the other builds individual shards. Also `index.NewShardBuilder` sounds better.
1 year ago
read.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
read_test.go
repoID bitmap for speeding up findShard in compound shards (#899) We add a new section to shards which contains a roaring bitmap for quickly checking if a shard contains a repo ID. We then can load just this (small amount) of data to rule out a compound shard. We use roaring bitmaps since we already have that dependency in our codebase. The reason we speed up this operation is we found on a large instance which contained thousands of tiny repos we spent so much time in findShard that our indexing queue would always fall behind. It is possible this new section won't speed this up enough and we need some sort of global oracle (or in-memory cache in indexserver?). This is noted in the code for future travellers. Test Plan: the existing unit tests already cover if this is forwards and backwards compatible. Additionally I added some logging to zoekt to test if older version of shards still work correctly in findShard, as well as if older versions of zoekt can read the new shards. Added a benchmark to check the impact. See comments in the code. --------- Co-authored-by: Stefan Hengl <stefan@sourcegraph.com>
1 year ago
score.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
section.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
shard_builder.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
shard_builder_test.go
Tackle the issue of XML files filtered as binaries in search results (#910) When skipping a doc, we currently report the detected language as "binary" (if it looks like binary) or "skipped" (if it's skipped for any other reason). Skipped docs are still added to the index and can still be returned as search results, for example if you only match on filename. So sometimes file matches are returned with "skipped" as their language, even though the file path is clearly some other language like XML. This PR updates the indexing logic to still detect the language even if the document is skipped. However, we avoid passing the contents to the language detection library to avoid running detection on huge files. --------- Co-authored-by: Julie Tibshirani <julietibs@apache.org>
1 year ago
toc.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago
tombstones.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
tombstones_test.go
Move root-level index code to index package (#902) In the repo root, we have a bunch of low level logic around index building and searching. So we end up exposing internal logic through the main public `zoekt` package, for example `zoekt.Merge(...)`. This PR moves it into the `build` package, so all code related to index building lives together. It then renames `build` to `index` to reflect the broader focus on indexing and searching the index.
1 year ago
write.go
ranking: incorporate file signals into BM25F (#922) This PR reworks the way we incorporate file signals into BM25. Previously, we were applying them as a tie-breaker. But in dogfooding, we found that these rarely impact results, because it's so rare to have a tie in BM25 scores. Now, we take the file signal into account when computing BM25F. The interpretation is that this data lives in a separate 'field' that is half the priority of regular content.
1 year ago