fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

score: experimental extension novelty in sorting (#665)

Right now we boost a file extension that hasn't been seen to the 3rd
position. This is gated by an environment variable which defaults to
on. I want to explore if there are ways we can turn on this behaviour
with the query language.

Test Plan: go run ./cmd/zoekt foo

+55 -3
+55 -3
contentprovider.go
··· 18 18 "bytes" 19 19 "fmt" 20 20 "log" 21 + "os" 22 + "path" 21 23 "sort" 22 24 "strings" 23 25 "unicode/utf8" 26 + 27 + "golang.org/x/exp/slices" 24 28 ) 25 29 26 30 var _ = log.Println ··· 908 912 sort.Sort(chunkMatchScoreSlice(ms)) 909 913 } 910 914 911 - // SortFiles sorts files matches. The order depends on the match score, which includes both 912 - // query-dependent signals like word overlap, and file-only signals like the file ranks (if 913 - // file ranks are enabled). 915 + var doNovelty = os.Getenv("ZOEKT_NOVELTY_DISABLE") == "" 916 + 917 + // SortFiles sorts files matches in the order we want to present results to 918 + // users. The order depends on the match score, which includes both 919 + // query-dependent signals like word overlap, and file-only signals like the 920 + // file ranks (if file ranks are enabled). 921 + // 922 + // We don't only use the scores, we will also boost some results to present 923 + // files with novel extensions. 914 924 func SortFiles(ms []FileMatch) { 915 925 sort.Sort(fileMatchesByScore(ms)) 926 + 927 + if doNovelty { 928 + // Experimentally boost something into the third filematch 929 + boostNovelExtension(ms, 2, 0.9) 930 + } 931 + } 932 + 933 + func boostNovelExtension(ms []FileMatch, boostOffset int, minScoreRatio float64) { 934 + if len(ms) <= boostOffset+1 { 935 + return 936 + } 937 + 938 + top := ms[:boostOffset] 939 + candidates := ms[boostOffset:] 940 + 941 + // Don't bother boosting something which is significantly different to the 942 + // result it replaces. 943 + minScoreForNovelty := candidates[0].Score * minScoreRatio 944 + 945 + // We want to look for an ext that isn't in the top exts 946 + exts := make([]string, len(top)) 947 + for i := range top { 948 + exts[i] = path.Ext(top[i].FileName) 949 + } 950 + 951 + for i := range candidates { 952 + // Do not assume sorted due to boostNovelExtension being called on subsets 953 + if candidates[i].Score < minScoreForNovelty { 954 + continue 955 + } 956 + 957 + if slices.Contains(exts, path.Ext(candidates[i].FileName)) { 958 + continue 959 + } 960 + 961 + // Found what we are looking for, now boost to front of candidates (which 962 + // is ms[boostOffset]) 963 + for ; i > 0; i-- { 964 + candidates[i], candidates[i-1] = candidates[i-1], candidates[i] 965 + } 966 + return 967 + } 916 968 }