read: inline delta-varint decoding in unmarshalDocumentSections (#375)
In the profiles of zoekt-webserver, this function accounts for as much
as 30% of CPU and 87% of allocated memory. While it seems a promising
direction to investigate better storage and/or reading strategies here,
removing unnecessary allocation and copying is an easy win for now.
Besides, there was an old TODO where I think there author had something
similar to this commit in mind.
The distribution in the benchmark is synthetic, and the gap sizes come
from (non-rigorous) averaging a sample from the indexed source code I happened
to have in my local zoekt copy. However, if they are anywhere close
to real, per the benchstat result below I expect to see a 1.0 - (0.3 * 0.75 + 0.7) = 7.5%
overall speed boost, and even better results for memory allocations.
name old time/op new time/op delta
UnmarshalDocSections/10-10 80.1ns ± 1% 52.5ns ± 1% -34.53% (p=0.000 n=10+9)
UnmarshalDocSections/100-10 602ns ± 1% 421ns ± 1% -30.17% (p=0.000 n=10+9)
UnmarshalDocSections/1000-10 7.30µs ± 1% 5.30µs ± 2% -27.41% (p=0.000 n=10+10)
UnmarshalDocSections/10000-10 72.2µs ± 2% 59.9µs ± 1% -17.02% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
UnmarshalDocSections/10-10 160B ± 0% 80B ± 0% -50.00% (p=0.000 n=10+10)
UnmarshalDocSections/100-10 1.79kB ± 0% 0.90kB ± 0% -50.00% (p=0.000 n=10+10)
UnmarshalDocSections/1000-10 16.4kB ± 0% 8.2kB ± 0% -50.00% (p=0.000 n=10+10)
UnmarshalDocSections/10000-10 164kB ± 0% 82kB ± 0% -50.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
UnmarshalDocSections/10-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
UnmarshalDocSections/100-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
UnmarshalDocSections/1000-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
UnmarshalDocSections/10000-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)