Indexing: improve doc content checks (#688)
This change makes a couple improvements to the doc content checks during indexing. When a large file is explicitly marked as "allowed", we don't enforce the max trigram count. However, we still iterated through all its trigrams and collected them in a map. Now we short-circuit the check to avoid counting all the trigrams.
We now also reuse the trigram map across documents. This makes sense, as it's always presized with the same capacity hint. This doesn't have a significant effect on indexing speed, but significantly reduces allocations