ranking: add IDF to BM25 score calculation (#788)
So far, we didn't include IDF in our BM25 score function. Zoekt uses a
trigram index and hence doesn't compute document frequency during
indexing. We could add this information to the index, but it is not
immediately obvious how to tokenize code in a way that is compatible
with tokens from a natural language query.
Here we calulate the document frequency at query time under the
assumption that we visit all documents containing any of the query terms.
Notes:
Also fixed an off-by-1 bug with how we count documents.
Test plan:
- Updated unit test
- Context evaluation results are slightly worse with a decrease from 64/89 to 63/89