fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

indexserver: remove shards with inconsistent repo name (#244)

When we switched to being based on ID we stopped cleaning up
repositories after rename. So the old repository shard would remain on
disk and never get updated. This is since zoekt will look for a
repository by name rather than by ID. So the old shard would be
invisible when updating an index.

This fix is to add to our cleanup process something which ensures the
names are consistent with a repository ID. If not it bails out by just
removing the shards and letting us converge to the correct state. This
means when a rename occurs we have this unfortunate steps:

1. index newname
2. delete newname and oldname
3. index newname

Given the rareness of this, I believe this is an acceptable fix.
However, the better (and more involved) fix is to move the shard names
from being based on repo name to repo ID. Or we provide a way to inform
zoekt-git-index about all shards based on ID. We already do something
like this for compound shards. However, I believe that should be future
work.

+74 -12
+74 -12
cmd/zoekt-sourcegraph-indexserver/cleanup.go
··· 59 59 delete(trash, repo) 60 60 } 61 61 62 + // index: We are ID based, but store shards by name still. If we end up with 63 + // shards that have the same ID but different names delete and start over. 64 + // This can happen when a repository is renamed. In future we should make 65 + // shard file names based on ID. 66 + for repo, shards := range index { 67 + if consistentRepoName(shards) { 68 + continue 69 + } 70 + 71 + // prevent further processing since we will delete 72 + delete(index, repo) 73 + 74 + // This should be rare, so give an informative log message. 75 + var paths []string 76 + for _, shard := range shards { 77 + paths = append(paths, filepath.Base(shard.Path)) 78 + } 79 + log.Printf("removing shards for %v due to multiple repository names: %s", repo, strings.Join(paths, " ")) 80 + 81 + // We may be in both normal and compound shards in this case. First 82 + // tombstone the compound shards so we don't just rm them. 83 + simple := shards[0:] 84 + for _, s := range shards { 85 + if shardMerging && maybeSetTombstone([]shard{s}, repo) { 86 + shardsLog(indexDir, "tombname", []shard{s}) 87 + } else { 88 + simple = append(simple, s) 89 + } 90 + } 91 + 92 + if len(simple) == 0 { 93 + continue 94 + } 95 + 96 + removeAll(simple...) 97 + shardsLog(indexDir, "removename", simple) 98 + } 99 + 62 100 // index: Move missing repos from trash into index 63 101 for _, repo := range repos { 64 102 // Delete from index so that index will only contain shards to be ··· 83 121 _ = os.Chtimes(shard.Path, now, now) 84 122 } 85 123 86 - if shardMerging { 87 - // 1 repo can be split across many simple shards but it should only be contained 88 - // in 1 compound shard. Hence we check that len(shards)==1 and only consider the 89 - // shard at index 0. 90 - if len(shards) == 1 && strings.HasPrefix(filepath.Base(shards[0].Path), "compound-") { 91 - shardsLog(indexDir, "tomb", shards) 92 - if err := zoekt.SetTombstone(shards[0].Path, repo); err != nil { 93 - log.Printf("error setting tombstone for %v in shard %s: %s. Removing shard\n", repo, shards[0].Path, err) 94 - _ = os.Remove(shards[0].Path) 95 - } 96 - continue 97 - } 124 + if shardMerging && maybeSetTombstone(shards, repo) { 125 + shardsLog(indexDir, "tomb", shards) 126 + continue 98 127 } 99 128 moveAll(trashDir, shards) 100 129 shardsLog(indexDir, "remove", shards) ··· 253 282 // update shards so partial failure removes the dst path 254 283 shards[i] = dstShard 255 284 } 285 + } 286 + 287 + // consistentRepoName returns true if the list of shards have a unique 288 + // repository name. 289 + func consistentRepoName(shards []shard) bool { 290 + if len(shards) <= 1 { 291 + return true 292 + } 293 + name := shards[0].RepoName 294 + for _, shard := range shards[1:] { 295 + if shard.RepoName != name { 296 + return false 297 + } 298 + } 299 + return true 300 + } 301 + 302 + // maybeSetTombstone will call zoekt.SetTombstone for repoID if shards 303 + // represents a compound shard. It returns true if shards represents a 304 + // compound shard. 305 + func maybeSetTombstone(shards []shard, repoID uint32) bool { 306 + // 1 repo can be split across many simple shards but it should only be contained 307 + // in 1 compound shard. Hence we check that len(shards)==1 and only consider the 308 + // shard at index 0. 309 + if len(shards) != 1 || !strings.HasPrefix(filepath.Base(shards[0].Path), "compound-") { 310 + return false 311 + } 312 + 313 + if err := zoekt.SetTombstone(shards[0].Path, repoID); err != nil { 314 + log.Printf("error setting tombstone for %d in shard %s: %s. Removing shard\n", repoID, shards[0].Path, err) 315 + _ = os.Remove(shards[0].Path) 316 + } 317 + return true 256 318 } 257 319 258 320 func shardsLog(indexDir, action string, shards []shard) {