Indexing: respect indexing buffer limit (#686)
When indexing documents, we buffer up documents until we reach the shard size
limit (100MB), then flush the shard. If we decide to skip a document because
it's a binary file, then (naturally) we don't count its content size towards
the shard limit. But we still buffered the full document. So if there are a large
number of binary files, we could easily blow past the 100MB limit and run into
memory issues.
This change simply clears `Content` whenever `SkipReason` is set. The
invariant: a buffered document should only ever have `SkipReason` or `Content`,
not both.