chunkmatches: reuse last calculated column when filling (#711)
This change uses the fact that candidate matches should be increasing in byte
offset, to avoid recounting runes on a line. Before this change if you have
many matches on the same line we would call `utf8.RuneCount` for each match,
which is a `O(nm)` algorithm where `n` is your line length and `m` is the
number of matches. After this change the complexity is `O(n)`.
I came across this while investigating slow performance for searching the
string "dev" on s2 taking 2s if the match limits where 100k instead of 10k.
With 10k it would take 0.04s. It turns out with the larger limit we ended up
searching a file were the word dev appeared many times on one line. Running a
profiler against the service came up with 96% of CPU time in `utf8.RuneCount`.
This commit adds a benchmark for the helper introduced to reuse RuneCounts.
Unsurprisingly the difference is massive between `O(nm)` and `O(n)` :)
name old time/op new time/op delta
ColumnHelper-32 299ms ± 2% 0ms ± 2% -99.97% (p=0.000 n=10+10)
Test Plan: Added tests and benchmarks.