fix/index: preserve skipped-file category through shard paths (#1073)
ShardBuilder.Add still determined language before file category for callers
that bypass Builder.Add. In that path skipped content can be replaced with the
not-indexed marker, so doing language first leaves category detection
operating on synthetic content instead of the original file and misses the
cheaper skip-aware language path.
This changes ShardBuilder.Add to determine the file category before rewriting
skipped content, then infer language afterward so direct ShardBuilder callers
follow the same behavior as Builder.Add and keep content-aware categorization
for skipped documents.
Shard merging also reconstructs documents through ShardBuilder, so this PR
carries the category already stored in the source shard into the rebuilt
document. That keeps merge metadata-preserving instead of forcing category
inference to rediscover information the original indexer already knew.