zoekt/testdata/ at 120aebf6db233e5f5179f0e1223e3a7b6b70ec5c · boltless.me/zoekt · Tangled

boltless.me / zoekt

0

fork of https://github.com/sourcegraph/zoekt

0

zoekt / testdata /

at 120aebf6db233e5f5179f0e1223e3a7b6b70ec5c 5 folders 1 file

Stefan Hengl scoring: score files based on absolute number of atoms (#542) 3y ago

scoring: score files based on absolute number of atoms (#542) This changes how atom-score is calculated to better reflect the complexity of the result or the user's intent. Currently, the atom-score is the ratio "atomMatchCount/totalAtomAcount". However, "totalAtomAcount" is based on the pruned match-tree which may be very different from the user query. For example, a match-tree for the query "foo or bar" is pruned to (a matchtree representing the query) "foo" if the shard doesn't contain the trigram "bar". In an extreme case, a query like "foo or bar or bas or qux" can receive the maximum atom-score if a repo just contains matches for foo. Assuming that the score of a match should reflect how close a match is to the user's intent, we should rather base the atom-score on the original unpruned match-tree. However, the original match-tree is already based on a simplified query, (see "d.simplify(q)" in eval.go) Hence I change the scoring function for atom-score to be based on the absolute count and to assymptotically approach scoreFactorAtomMatch. This also makes the score more comparable across shards.

3 years ago

scoring: score files based on absolute number of atoms (#542) This changes how atom-score is calculated to better reflect the complexity of the result or the user's intent. Currently, the atom-score is the ratio "atomMatchCount/totalAtomAcount". However, "totalAtomAcount" is based on the pruned match-tree which may be very different from the user query. For example, a match-tree for the query "foo or bar" is pruned to (a matchtree representing the query) "foo" if the shard doesn't contain the trigram "bar". In an extreme case, a query like "foo or bar or bas or qux" can receive the maximum atom-score if a repo just contains matches for foo. Assuming that the score of a match should reflect how close a match is to the user's intent, we should rather base the atom-score on the original unpruned match-tree. However, the original match-tree is already based on a simplified query, (see "d.simplify(q)" in eval.go) Hence I change the scoring function for atom-score to be based on the absolute count and to assymptotically approach scoreFactorAtomMatch. This also makes the score more comparable across shards.

3 years ago

Add small test for v15 backwards compatibility (#23) * add test for backwards compatibility for v15 * update version * WIP use smaller index Change-Id: Id28f9477a400b7d5649bbc0e8a4d567813792fae * reduce test index + use golden file * cleanup * update test

6 years ago

merging: support exploding compound shards (#271) This change let's us split a compound shard into its constituent repos. In the future this should happen instead of deleting too small compound shards. Fow now, the feature is behind a feature flag. To activate, place a file EXPLODE in the index dir.

4 years ago

ranking: add document ranks to shards (#449) We persist document ranks in the shards and sort file matches based on the rankings determined by the document ranks and match scores. Co-authored-by: Keegan Carruthers-Smith <keegan.csmith@gmail.com>

3 years ago

ci: add shellcheck step (#316)

4 years ago