fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

index: 3.7x faster posting list construction via direct-indexed ASCII array (#1020)

* index: reduce GC pressure in posting list construction by 1.8x

Three targeted changes to newSearchableString, the hot path where ~54%
of CPU was spent on runtime memory management (memclr, memmove, madvise,
mapassign):

1. Merge postings and lastOffsets maps into a single map[ngram]*postingList.
Pointer-stored values mean the map is only written when a new ngram is
first seen (~282K writes for kubernetes) rather than on every trigram
occurrence (~169M). This cuts per-trigram map operations from 4 to ~1.

2. Pre-size the map using estimateNgrams(shardMaxBytes) and pre-allocate
each posting list to 1024 bytes, reducing slice growth events and
eliminating map rehashing.

3. Pool postingsBuilder instances via sync.Pool on the Builder, so
sequential and parallel shard builds reuse map and slice allocations
across shards instead of re-creating them.

Benchmarked on kubernetes (22.9K files, 169 MB, Apple M1 Max):

Cold path (NewSearchableString):
Time: 9.3s → 5.3s (-43%)
Allocs: 901K → 677K (-25%)
B/op: 1358 → 1536 MB (+13%, from larger pre-alloc)

Warm path (pooled reuse across shards):
Time: 9.3s → 5.1s (-45%)
Allocs: 901K → 23K (-97%)
B/op: 1358 → 0.6 MB (-99.96%)

WritePostings: ~155ms, unchanged.

* index: direct-indexed array for ASCII trigrams, 3.7x faster builds

Replace map lookups with a direct-indexed [2M]*postingList array for
ASCII trigrams (3 × 7-bit runes = 21-bit index, 16 MB of pointers).
Since 99%+ of trigrams in source code are pure ASCII, this eliminates
nearly all hash and probe overhead from the hot loop. Non-ASCII
trigrams still fall back to the map.

Also inline the ASCII check (data[0] < utf8.RuneSelf) to avoid
utf8.DecodeRune function call overhead on the 95-99% of bytes that
are ASCII.

In writePostings, collect ngrams from both the array and map into a
single sorted slice for writing. The on-disk format is unchanged.

Benchmarked on kubernetes (22.9K files, 169 MB, Apple M1 Max):

Cold path (NewSearchableString):
Before (with map opt): 5.3s, 677K allocs, 1536 MB
After: 2.5s, 676K allocs, 1544 MB
Speedup: 2.1x (3.7x vs original baseline)

Warm path (pooled reuse):
Before (with map opt): 5.1s, 23K allocs, 0.6 MB
After: 2.3s, 23K allocs, 0.6 MB
Speedup: 2.2x (4.0x vs original baseline)

WritePostings: ~130ms, unchanged.

The non-ASCII map now holds only ~6K entries (vs ~282K before), since
the vast majority of trigrams are served by the direct array.

* index: fix stale comments after ASCII array introduction

Update three comments that still referenced map-only storage after the
direct-indexed ASCII array was added: postingList doc, estimateNgrams
doc, and reset() doc.

* index: address review feedback from keegancsmith

- Replace putPostingsBuilder with returnPostingsBuilders(*ShardBuilder)
that returns both content and name builders to the pool and nils the
fields, so any subsequent misuse panics obviously.

- Drop error-path pool returns in newShardBuilder: setRepository errors
are extremely rare (invalid templates or >64 branches), not worth the
code complexity.

- Thread shardMax through newShardBuilder(shardMax int). Callers without
a shard size context (merge, tests, public API) pass 0 for the default
(100 MB). Builder.newShardBuilder passes b.opts.ShardMax via the pool.

- Switch sort.Slice to slices.SortFunc in writePostings for type safety
and to avoid interface boxing overhead.

* index: sparse ASCII index, reduce initialPostingCap to 64

Address review feedback:

- Add asciiPopulated []uint32 sparse index so reset() and
writePostings iterate only populated slots (~275K) instead
of scanning all 2M. Retains postingList allocations for
pool reuse via len(pl.data)==0 detection on the hot path.

- Reduce initialPostingCap from 1024 to 64. On kubernetes
(282K trigrams), median posting list is 10 bytes and 78%
are under 64. Drops pre-allocation waste from 244 MB to
11 MB. Cold-path B/op: 1558 MB → 1352 MB (-13%).

+343 -31
+33 -3
index/builder.go
··· 275 275 id string 276 276 277 277 finishCalled bool 278 + 279 + // postingsPool reuses postingsBuilder instances across shard builds, 280 + // retaining their map and slice allocations to avoid repeated 281 + // memclr/madvise overhead. 282 + postingsPool sync.Pool 278 283 } 279 284 280 285 type finishedShard struct { ··· 984 989 } 985 990 } 986 991 987 - return b.writeShard(name, shardBuilder) 992 + result, err := b.writeShard(name, shardBuilder) 993 + b.returnPostingsBuilders(shardBuilder) 994 + return result, err 988 995 } 989 996 990 997 // CheckMemoryUsage checks the memory usage of the process and writes a memory profile if the heap usage exceeds the ··· 1018 1025 } 1019 1026 } 1020 1027 1028 + func (b *Builder) getPostingsBuilder() *postingsBuilder { 1029 + if pb, ok := b.postingsPool.Get().(*postingsBuilder); ok { 1030 + pb.reset() 1031 + return pb 1032 + } 1033 + return newPostingsBuilder(b.opts.ShardMax) 1034 + } 1035 + 1036 + // returnPostingsBuilders returns both postings builders from sb to the 1037 + // pool and nils the fields so any subsequent misuse crashes obviously. 1038 + func (b *Builder) returnPostingsBuilders(sb *ShardBuilder) { 1039 + if sb.contentPostings != nil { 1040 + b.postingsPool.Put(sb.contentPostings) 1041 + sb.contentPostings = nil 1042 + } 1043 + if sb.namePostings != nil { 1044 + b.postingsPool.Put(sb.namePostings) 1045 + sb.namePostings = nil 1046 + } 1047 + } 1048 + 1021 1049 func (b *Builder) newShardBuilder() (*ShardBuilder, error) { 1022 1050 desc := b.opts.RepositoryDescription 1023 1051 desc.HasSymbols = !b.opts.DisableCTags && b.opts.CTagsPath != "" 1024 1052 desc.SubRepoMap = b.opts.SubRepositories 1025 1053 desc.IndexOptions = b.opts.GetHash() 1026 1054 1027 - shardBuilder, err := NewShardBuilder(&desc) 1028 - if err != nil { 1055 + content := b.getPostingsBuilder() 1056 + name := b.getPostingsBuilder() 1057 + shardBuilder := newShardBuilderWithPostings(content, name) 1058 + if err := shardBuilder.setRepository(&desc); err != nil { 1029 1059 return nil, err 1030 1060 } 1031 1061 shardBuilder.IndexTime = b.indexTime
+2 -2
index/index_test.go
··· 66 66 func testShardBuilderCompound(t *testing.T, repos []*zoekt.Repository, docs [][]Document) *ShardBuilder { 67 67 t.Helper() 68 68 69 - b := newShardBuilder() 69 + b := newShardBuilder(0) 70 70 b.indexFormatVersion = NextIndexFormatVersion 71 71 72 72 if len(repos) != len(docs) { ··· 2144 2144 } 2145 2145 2146 2146 func TestRepoWithMetadata(t *testing.T) { 2147 - sb := newShardBuilder() 2147 + sb := newShardBuilder(0) 2148 2148 sb.repoList = []zoekt.Repository{ 2149 2149 { 2150 2150 Name: "repo1",
+2 -2
index/merge.go
··· 98 98 return ds[i].repoMetaData[0].GetPriority() > ds[j].repoMetaData[0].GetPriority() 99 99 }) 100 100 101 - sb := newShardBuilder() 101 + sb := newShardBuilder(0) 102 102 sb.indexFormatVersion = NextIndexFormatVersion 103 103 104 104 for _, d := range ds { ··· 246 246 } 247 247 } 248 248 249 - sb = newShardBuilder() 249 + sb = newShardBuilder(0) 250 250 sb.indexFormatVersion = IndexFormatVersion 251 251 if err := sb.setRepository(&d.repoMetaData[repoID]); err != nil { 252 252 return shardNames, err
+150
index/postings_bench_test.go
··· 1 + package index 2 + 3 + import ( 4 + "bytes" 5 + "fmt" 6 + "io/fs" 7 + "os" 8 + "path/filepath" 9 + "testing" 10 + ) 11 + 12 + // Set ZOEKT_BENCH_REPO to a source tree (e.g. a kubernetes checkout) to enable. 13 + // 14 + // git clone --depth=1 https://github.com/kubernetes/kubernetes /tmp/k8s 15 + // ZOEKT_BENCH_REPO=/tmp/k8s go test ./index/ -bench=BenchmarkPostings -benchmem -count=5 -timeout=600s 16 + 17 + func requireBenchRepo(b *testing.B) string { 18 + b.Helper() 19 + dir := os.Getenv("ZOEKT_BENCH_REPO") 20 + if dir == "" { 21 + b.Skip("ZOEKT_BENCH_REPO not set") 22 + } 23 + return dir 24 + } 25 + 26 + // loadRepoFiles walks dir and returns file contents, skipping binary files, 27 + // empty files, and anything over 1 MB. Returns at most maxFiles entries. 28 + func loadRepoFiles(b *testing.B, dir string, maxFiles int) [][]byte { 29 + b.Helper() 30 + var files [][]byte 31 + err := filepath.WalkDir(dir, func(path string, d fs.DirEntry, err error) error { 32 + if err != nil { 33 + return nil 34 + } 35 + if d.IsDir() { 36 + switch d.Name() { 37 + case ".git", "vendor", "node_modules": 38 + return filepath.SkipDir 39 + } 40 + return nil 41 + } 42 + if len(files) >= maxFiles { 43 + return filepath.SkipAll 44 + } 45 + info, err := d.Info() 46 + if err != nil || info.Size() == 0 || info.Size() > 1<<20 { 47 + return nil 48 + } 49 + data, err := os.ReadFile(path) 50 + if err != nil { 51 + return nil 52 + } 53 + if bytes.IndexByte(data, 0) >= 0 { 54 + return nil // binary 55 + } 56 + files = append(files, data) 57 + return nil 58 + }) 59 + if err != nil { 60 + b.Fatalf("walking repo: %v", err) 61 + } 62 + if len(files) == 0 { 63 + b.Fatal("no files found in repo") 64 + } 65 + return files 66 + } 67 + 68 + func totalSize(files [][]byte) int64 { 69 + var n int64 70 + for _, f := range files { 71 + n += int64(len(f)) 72 + } 73 + return n 74 + } 75 + 76 + // BenchmarkPostings_NewSearchableString measures the core hot path: trigram 77 + // extraction, map lookups, delta encoding, and per-trigram slice growth. 78 + // Sub-benchmarks vary corpus size to show scaling with map size. 79 + func BenchmarkPostings_NewSearchableString(b *testing.B) { 80 + dir := requireBenchRepo(b) 81 + allFiles := loadRepoFiles(b, dir, 50_000) 82 + b.Logf("loaded %d files, %.1f MB", len(allFiles), float64(totalSize(allFiles))/(1<<20)) 83 + 84 + for _, n := range []int{1_000, 5_000, len(allFiles)} { 85 + n = min(n, len(allFiles)) 86 + files := allFiles[:n] 87 + size := totalSize(files) 88 + 89 + b.Run(fmt.Sprintf("files=%d", n), func(b *testing.B) { 90 + b.ReportAllocs() 91 + for b.Loop() { 92 + pb := newPostingsBuilder(defaultShardMax) 93 + for _, data := range files { 94 + _, _, _ = pb.newSearchableString(data, nil) 95 + } 96 + } 97 + b.ReportMetric(float64(size), "input-bytes/op") 98 + }) 99 + } 100 + } 101 + 102 + // BenchmarkPostings_Reuse measures the warm path: building postings with a 103 + // reset (pooled) postingsBuilder that retains its map and slice allocations 104 + // from a previous shard build. 105 + func BenchmarkPostings_Reuse(b *testing.B) { 106 + dir := requireBenchRepo(b) 107 + allFiles := loadRepoFiles(b, dir, 50_000) 108 + size := totalSize(allFiles) 109 + b.Logf("loaded %d files, %.1f MB", len(allFiles), float64(size)/(1<<20)) 110 + 111 + // Warm up the builder so it has allocated map entries and slices. 112 + pb := newPostingsBuilder(defaultShardMax) 113 + for _, data := range allFiles { 114 + _, _, _ = pb.newSearchableString(data, nil) 115 + } 116 + 117 + b.ResetTimer() 118 + b.ReportAllocs() 119 + for b.Loop() { 120 + pb.reset() 121 + for _, data := range allFiles { 122 + _, _, _ = pb.newSearchableString(data, nil) 123 + } 124 + } 125 + b.ReportMetric(float64(size), "input-bytes/op") 126 + } 127 + 128 + // BenchmarkPostings_WritePostings measures the marshaling path: sorting ngram 129 + // keys and writing varint-encoded posting lists. 130 + func BenchmarkPostings_WritePostings(b *testing.B) { 131 + dir := requireBenchRepo(b) 132 + allFiles := loadRepoFiles(b, dir, 50_000) 133 + 134 + pb := newPostingsBuilder(defaultShardMax) 135 + for _, data := range allFiles { 136 + _, _, _ = pb.newSearchableString(data, nil) 137 + } 138 + b.Logf("built %d unique ngrams from %d files, %.1f MB", len(pb.postings), len(allFiles), float64(totalSize(allFiles))/(1<<20)) 139 + 140 + buf := &bytes.Buffer{} 141 + b.ResetTimer() 142 + b.ReportAllocs() 143 + for b.Loop() { 144 + buf.Reset() 145 + w := &writer{w: buf} 146 + var ngramText, charOffsets, endRunes simpleSection 147 + var postings compoundSection 148 + writePostings(w, pb, &ngramText, &charOffsets, &postings, &endRunes) 149 + } 150 + }
+132 -16
index/shard_builder.go
··· 59 59 // Store character (unicode codepoint) offset (in bytes) this often. 60 60 const runeOffsetFrequency = 100 61 61 62 + // postingList holds the varint-encoded delta data and last offset for a 63 + // single ngram. Stored by pointer in the asciiPostings array or the 64 + // postings map so appending to data does not require rewriting the 65 + // map entry or array slot. 66 + type postingList struct { 67 + data []byte 68 + lastOff uint32 69 + } 70 + 71 + // asciiNgramBits is the number of bits needed to index all ASCII trigrams. 72 + // ASCII runes are 0-127 (7 bits), so 3 runes = 21 bits = 2M entries. 73 + const asciiNgramBits = 21 74 + 75 + // asciiNgramIndex packs three ASCII bytes into a 21-bit array index. 76 + func asciiNgramIndex(a, b, c byte) uint32 { 77 + return uint32(a)<<14 | uint32(b)<<7 | uint32(c) 78 + } 79 + 80 + // asciiIndexToNgram converts a 21-bit ASCII array index back to the 81 + // canonical ngram encoding (rune[0]<<42 | rune[1]<<21 | rune[2]). 82 + func asciiIndexToNgram(idx uint32) ngram { 83 + r0 := uint64(idx >> 14) 84 + r1 := uint64((idx >> 7) & 0x7f) 85 + r2 := uint64(idx & 0x7f) 86 + return ngram(r0<<42 | r1<<21 | r2) 87 + } 88 + 62 89 type postingsBuilder struct { 63 - postings map[ngram][]byte 64 - lastOffsets map[ngram]uint32 90 + // ASCII trigrams use direct-indexed array (zero hash/probe cost). 91 + // Non-ASCII trigrams fall back to the map. 92 + asciiPostings [1 << asciiNgramBits]*postingList 93 + postings map[ngram]*postingList 94 + 95 + // asciiPopulated tracks which indices in asciiPostings are non-nil, 96 + // so reset() and writePostings iterate only populated slots — O(n) 97 + // where n is unique ASCII trigrams (~275K) instead of O(2M). 98 + asciiPopulated []uint32 65 99 66 100 // To support UTF-8 searching, we must map back runes to byte 67 101 // offsets. As a first attempt, we sample regularly. The ··· 76 110 endByte uint32 77 111 } 78 112 79 - func newPostingsBuilder() *postingsBuilder { 113 + // Initial capacity for each posting list's byte slice. On the 114 + // kubernetes corpus (282K unique trigrams), the median posting list is 115 + // 10 bytes and 78% are under 64 bytes (power-law distribution). 116 + // Pre-allocating 64 covers the majority without the 244 MB waste that 117 + // a mean-based value (1024) would cause. 118 + const initialPostingCap = 64 119 + 120 + // estimateNgrams returns a pre-size hint for the non-ASCII postings map, 121 + // derived from the maximum shard content size. Intentionally over-estimates 122 + // (the map only holds non-ASCII trigrams) to avoid rehashing. 123 + func estimateNgrams(shardMaxBytes int) int { 124 + n := shardMaxBytes / 600 125 + if n < 1024 { 126 + n = 1024 127 + } 128 + return n 129 + } 130 + 131 + func newPostingsBuilder(shardMaxBytes int) *postingsBuilder { 80 132 return &postingsBuilder{ 81 - postings: map[ngram][]byte{}, 82 - lastOffsets: map[ngram]uint32{}, 133 + postings: make(map[ngram]*postingList, estimateNgrams(shardMaxBytes)), 83 134 isPlainASCII: true, 84 135 } 85 136 } 86 137 138 + // reset clears the builder for reuse. All postingList allocations 139 + // (backing arrays, map entries, ASCII array slots) are retained so the 140 + // next shard build avoids re-allocating them. 141 + // Uses asciiPopulated to reset only populated slots — O(populated) 142 + // instead of O(2M). Slots are kept non-nil with data truncated to 143 + // len 0; the hot path uses len(pl.data)==0 to re-record them in 144 + // asciiPopulated for the next shard. 145 + func (s *postingsBuilder) reset() { 146 + for _, idx := range s.asciiPopulated { 147 + pl := s.asciiPostings[idx] 148 + pl.data = pl.data[:0] 149 + pl.lastOff = 0 150 + } 151 + s.asciiPopulated = s.asciiPopulated[:0] 152 + for _, pl := range s.postings { 153 + pl.data = pl.data[:0] 154 + pl.lastOff = 0 155 + } 156 + s.runeOffsets = s.runeOffsets[:0] 157 + s.runeCount = 0 158 + s.isPlainASCII = true 159 + s.endRunes = s.endRunes[:0] 160 + s.endByte = 0 161 + } 162 + 87 163 // Store trigram offsets for the given UTF-8 data. The 88 164 // DocumentSections must correspond to rune boundaries in the UTF-8 89 165 // data. ··· 106 182 107 183 endRune := s.runeCount 108 184 for ; len(data) > 0; runeIndex++ { 109 - c, sz := utf8.DecodeRune(data) 110 - if sz > 1 { 185 + // ASCII fast path: avoid utf8.DecodeRune call overhead. 186 + // For source code, 95-99% of bytes are ASCII. 187 + var c rune 188 + sz := 1 189 + if data[0] < utf8.RuneSelf { 190 + c = rune(data[0]) 191 + } else { 192 + c, sz = utf8.DecodeRune(data) 111 193 s.isPlainASCII = false 112 194 } 113 195 data = data[sz:] ··· 129 211 continue 130 212 } 131 213 132 - ng := runesToNGram(runeGram) 133 - lastOff := s.lastOffsets[ng] 134 214 newOff := endRune + uint32(runeIndex) - 2 135 215 136 - m := binary.PutUvarint(buf[:], uint64(newOff-lastOff)) 137 - s.postings[ng] = append(s.postings[ng], buf[:m]...) 138 - s.lastOffsets[ng] = newOff 216 + // ASCII trigrams use direct-indexed array (no hash/probe). 217 + var pl *postingList 218 + if runeGram[0] < utf8.RuneSelf && runeGram[1] < utf8.RuneSelf && runeGram[2] < utf8.RuneSelf { 219 + idx := asciiNgramIndex(byte(runeGram[0]), byte(runeGram[1]), byte(runeGram[2])) 220 + pl = s.asciiPostings[idx] 221 + if pl == nil { 222 + pl = &postingList{data: make([]byte, 0, initialPostingCap)} 223 + s.asciiPostings[idx] = pl 224 + s.asciiPopulated = append(s.asciiPopulated, idx) 225 + } else if len(pl.data) == 0 { 226 + // Retained from a previous shard (pool reuse) — re-record 227 + // in asciiPopulated for this shard's writePostings. 228 + s.asciiPopulated = append(s.asciiPopulated, idx) 229 + } 230 + } else { 231 + ng := runesToNGram(runeGram) 232 + pl = s.postings[ng] 233 + if pl == nil { 234 + pl = &postingList{data: make([]byte, 0, initialPostingCap)} 235 + s.postings[ng] = pl 236 + } 237 + } 238 + m := binary.PutUvarint(buf[:], uint64(newOff-pl.lastOff)) 239 + pl.data = append(pl.data, buf[:m]...) 240 + pl.lastOff = newOff 139 241 } 140 242 s.runeCount += runeIndex 141 243 ··· 271 373 // NewShardBuilder creates a fresh ShardBuilder. The passed in 272 374 // Repository contains repo metadata, and may be set to nil. 273 375 func NewShardBuilder(r *zoekt.Repository) (*ShardBuilder, error) { 274 - b := newShardBuilder() 376 + b := newShardBuilder(0) 275 377 276 378 if r == nil { 277 379 r = &zoekt.Repository{} ··· 282 384 return b, nil 283 385 } 284 386 285 - func newShardBuilder() *ShardBuilder { 387 + const defaultShardMax = 100 << 20 // 100 MB, matches Options.ShardMax default 388 + 389 + // newShardBuilder creates a ShardBuilder with fresh postingsBuilders. 390 + // shardMax is the maximum shard content size in bytes (0 uses defaultShardMax). 391 + func newShardBuilder(shardMax int) *ShardBuilder { 392 + if shardMax <= 0 { 393 + shardMax = defaultShardMax 394 + } 395 + return newShardBuilderWithPostings( 396 + newPostingsBuilder(shardMax), 397 + newPostingsBuilder(shardMax), 398 + ) 399 + } 400 + 401 + func newShardBuilderWithPostings(content, name *postingsBuilder) *ShardBuilder { 286 402 return &ShardBuilder{ 287 403 indexFormatVersion: IndexFormatVersion, 288 404 featureVersion: FeatureVersion, 289 405 290 - contentPostings: newPostingsBuilder(), 291 - namePostings: newPostingsBuilder(), 406 + contentPostings: content, 407 + namePostings: name, 292 408 fileEndSymbol: []uint32{0}, 293 409 symIndex: make(map[string]uint32), 294 410 symKindIndex: make(map[string]uint32),
+24 -8
index/write.go
··· 17 17 import ( 18 18 "bufio" 19 19 "bytes" 20 + "cmp" 20 21 "encoding/binary" 21 22 "encoding/json" 22 23 "fmt" 23 24 "io" 25 + "slices" 24 26 "sort" 25 27 "time" 26 28 ··· 79 81 func writePostings(w *writer, s *postingsBuilder, ngramText *simpleSection, 80 82 charOffsets *simpleSection, postings *compoundSection, endRunes *simpleSection, 81 83 ) { 82 - keys := make(ngramSlice, 0, len(s.postings)) 83 - for k := range s.postings { 84 - keys = append(keys, k) 84 + // Collect ngrams from both the ASCII direct-indexed array and the 85 + // non-ASCII map, then sort by ngram value. 86 + type ngramPosting struct { 87 + ng ngram 88 + pl *postingList 85 89 } 86 - sort.Sort(keys) 90 + all := make([]ngramPosting, 0, len(s.asciiPopulated)+len(s.postings)) 91 + for _, idx := range s.asciiPopulated { 92 + pl := s.asciiPostings[idx] 93 + if len(pl.data) > 0 { 94 + all = append(all, ngramPosting{asciiIndexToNgram(idx), pl}) 95 + } 96 + } 97 + for k, pl := range s.postings { 98 + if len(pl.data) > 0 { 99 + all = append(all, ngramPosting{k, pl}) 100 + } 101 + } 102 + slices.SortFunc(all, func(a, b ngramPosting) int { return cmp.Compare(a.ng, b.ng) }) 87 103 88 104 ngramText.start(w) 89 - for _, k := range keys { 105 + for _, np := range all { 90 106 var buf [8]byte 91 - binary.BigEndian.PutUint64(buf[:], uint64(k)) 107 + binary.BigEndian.PutUint64(buf[:], uint64(np.ng)) 92 108 w.Write(buf[:]) 93 109 } 94 110 ngramText.end(w) 95 111 96 112 postings.start(w) 97 - for _, k := range keys { 98 - postings.addItem(w, s.postings[k]) 113 + for _, np := range all { 114 + postings.addItem(w, np.pl.data) 99 115 } 100 116 postings.end(w) 101 117