build: faster newLinesIndices via bytes.IndexByte and buffer re-use (#680)
Firstly we use bytes.IndexByte for faster newLinesIndices. On my machine this
reduces wall clock time of BenchmarkTagsToSections by 38%. This is faster
since bytes.IndexByte relies on CPU specific optimizations to find the next
new line (eg uses AVX2 if available).
Secondly we reuse nls slice between calls to tagsToSections. I noticed in the
profiler a nonsignificant chunk in the garbage collector. The slice built by
newLinesIndices is allocated and thrown away for each call to tagsToSections.
This means we can re-use it which this commit implements by introducing a
struct storing the buffer. We now use this buffer per shard of symbols we
analyse.
old time/op new time/op delta
188µs ± 7% 101µs ± 3% -46.10% (p=0.000 n=10+10)
old alloc/op new alloc/op delta
79.3kB ± 0% 36.3kB ± 0% -54.24% (p=0.000 n=9+10)
old allocs/op new allocs/op delta
443 ± 0% 441 ± 0% -0.45% (p=0.000 n=10+10)
Test Plan: go test -bench BenchmarkTagsToSections