fix: don't modify finalCands (#773) · boltless.me/zoekt@72f9500

fork of https://github.com/sourcegraph/zoekt

fix: don't modify finalCands (#773)

While working on ranking, I noticed that sum-tf is wrong if we have filename and content matches.

We use `finalCands` in our BM25 scoring, however, `finalCands` is modified in `fillChunkMatches` and `fillMatches` which can lead to surprising scores.

Test plan:
updated unit test

author

Stefan Hengl committer

GitHub date 2 years ago (May 1, 2024, 9:21 AM +0200) commit 72f95004 72f95004e6d6136fed7bf973616e89e6d37e4eaa parent 68d04651 68d04651cc8e4989e64ad72470fe2bc27efb91c2

+4 -4

2 changed files

Expand all

build

scoring_test.go

contentprovider.go

+2 -2

build/scoring_test.go

··· 77 77 query: &query.Substring{Pattern: "example"}, 78 78 content: exampleJava, 79 79 language: "Java", 80 - // keyword-score:1.63 (sum-tf: 6.00, length-ratio: 2.00) 81 - wantScore: 1.63, 80 + // keyword-score:1.69 (sum-tf: 7.00, length-ratio: 2.00) 81 + wantScore: 1.69, 82 82 }, { 83 83 // Matches only on content 84 84 fileName: "example.java",

+2 -2

contentprovider.go

··· 147 147 // returned by the API it needs to be copied. 148 148 func (p *contentProvider) fillMatches(ms []*candidateMatch, numContextLines int, language string, debug bool) []LineMatch { 149 149 var filenameMatches []*candidateMatch 150 - contentMatches := ms[:0] 150 + contentMatches := make([]*candidateMatch, 0, len(ms)) 151 151 152 152 for _, m := range ms { 153 153 if m.fileName { ··· 194 194 // returned by the API it needs to be copied. 195 195 func (p *contentProvider) fillChunkMatches(ms []*candidateMatch, numContextLines int, language string, debug bool) []ChunkMatch { 196 196 var filenameMatches []*candidateMatch 197 - contentMatches := ms[:0] 197 + contentMatches := make([]*candidateMatch, 0, len(ms)) 198 198 199 199 for _, m := range ms { 200 200 if m.fileName {

Configure Feed

Configure Feed