fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

index: skip max trigram check if content is too small (#430)

When viewing profiles for zoekt-git-index on the sourcegraph repo, the
map assignment for the trigram check in CheckText represented 12% of CPU
time. However, we don't need to bother doing this if the files length is
less than the threshold. In particular, a file can contain at most
len(file) - 3 + 1 trigrams.

Here is the change via hyperfine on indexing the sourcegraph repo. I
didn't check via benchmark since I wanted a real world example.

$ hyperfine -w 1 \
'./zoekt-git-index-base -incremental=false -disable_ctags=true .' \
'./zoekt-git-index-trigram -incremental=false -disable_ctags=true .'
..snip..
Summary
'./zoekt-git-index-trigram -incremental=false -disable_ctags=true .' ran
1.10 ± 0.01 times faster than './zoekt-git-index-base -incremental=false -disable_ctags=true .'

Test Plan: go test and manual testing

+12 -5
+12 -5
indexbuilder.go
··· 324 324 return fmt.Errorf("file size smaller than %d", ngramSize) 325 325 } 326 326 327 - trigrams := map[ngram]struct{}{} 327 + // PERF: we only need to do the trigram check if the upperbound on content 328 + // is greater than our threshold. 329 + var trigrams map[ngram]struct{} 330 + if trigramsUpperBound := len(content) - ngramSize + 1; trigramsUpperBound > maxTrigramCount { 331 + trigrams = make(map[ngram]struct{}, maxTrigramCount+1) 332 + } 328 333 329 334 var cur [3]rune 330 335 byteCount := 0 ··· 343 348 continue 344 349 } 345 350 346 - trigrams[runesToNGram(cur)] = struct{}{} 347 - if len(trigrams) > maxTrigramCount { 348 - // probably not text. 349 - return fmt.Errorf("number of trigrams exceeds %d", maxTrigramCount) 351 + if trigrams != nil { 352 + trigrams[runesToNGram(cur)] = struct{}{} 353 + if len(trigrams) > maxTrigramCount { 354 + // probably not text. 355 + return fmt.Errorf("number of trigrams exceeds %d", maxTrigramCount) 356 + } 350 357 } 351 358 } 352 359 return nil