ChunkMatches: drop trailing newline (#709)

I'd like to start taking advantage of the NumContextLines option, but have been running into an issue where, if context lines were to extend past the end of the file, we get the trailing newline at the end of the file.

This is not desirable because the empty slice after a trailing newline is not treated as a "line" by any editor, so when we split on newlines, an empty line is shown to the user. By the time we're splitting on newlines, we do not know whether or not we're at the end of the file, so we cannot know whether we can trim that trailing newline, or whether it is, correctly, an empty line that should be rendered.

This is really annoying stuff that bites us elsewhere as well (searcher, syntax highlighter). It might be nice to unify the definitions of "what is a line" in some library, but for now, I'll be happy with "don't return an extra line."

author

Camden Cheek committer

GitHub date 2 years ago (Dec 11, 2023, 9:02 AM -0700) commit e92f6c76 e92f6c764b567d9b359633fa4045cfba9d37eb3c parent 7b678e15 7b678e15c3489c7d7dea7ad709dc2eed46c5a1f5

+23 -4

4 changed files

Expand all

.github

workflows

ci.yml

contentprovider.go

contentprovider_test.go

web

e2e_test.go

+5 -1

.github/workflows/ci.yml

··· 23 23 name: fuzz test 24 24 runs-on: ubuntu-latest 25 25 steps: 26 - - uses: jidicula/go-fuzz-action@4f24eed45b25214f31a9fe035ca68ea2c88c6a13 # v1.2.0 26 + # Pinned a commit to make go version configurable. 27 + # This should be safe to upgrade once this commit is in a released version: 28 + # https://github.com/jidicula/go-fuzz-action/commit/23cc553941669144159507e2cccdbb4afc5b3076 29 + - uses: jidicula/go-fuzz-action@0206b61afc603b665297621fa5e691b1447a5e57 27 30 with: 28 31 packages: 'github.com/sourcegraph/zoekt' # This is the package where the Protobuf round trip tests are defined 29 32 fuzz-time: 30s 30 33 fuzz-minimize-time: 1m 34 + go-version: '1.21' 31 35 32 36 shellcheck: 33 37 name: shellcheck

+12 -1

contentprovider.go

··· 454 454 lowStart, _ := nls.lineBounds(low) 455 455 _, highEnd := nls.lineBounds(high - 1) 456 456 457 + // Drop any trailing newline. Editors do not treat a trailing newline as 458 + // the start of a new line, so we should not either. lineBounds clamps to 459 + // len(data) when an out-of-bounds line is requested. 460 + // 461 + // As an example, if we request lines 1-5 from a file with contents 462 + // `one\ntwo\nthree\n`, we should return `one\ntwo\nthree` because those are 463 + // the three "lines" in the file, separated by newlines. 464 + if highEnd == uint32(len(data)) && bytes.HasSuffix(data, []byte{'\n'}) { 465 + highEnd = highEnd - 1 466 + lowStart = min(lowStart, highEnd) 467 + } 468 + 457 469 return data[lowStart:highEnd] 458 470 } 459 471 ··· 680 692 } 681 693 return data[sec.Start:sec.End] 682 694 } 683 - 684 695 685 696 // scoreSymbolKind boosts a match based on the combination of language, symbol 686 697 // and kind. The language string comes from go-enry, the symbol and kind from

+2 -1

contentprovider_test.go

··· 32 32 for _, content := range contents { 33 33 t.Run("", func(t *testing.T) { 34 34 newLines := getNewlines(content) 35 - lines := bytes.Split(content, []byte{'\n'}) // TODO does split group consecutive sep? 35 + // Trim the last newline before splitting because a trailing newline does not constitute a new line 36 + lines := bytes.Split(bytes.TrimSuffix(content, []byte{'\n'}), []byte{'\n'}) 36 37 wantGetLines := func(low, high int) []byte { 37 38 low-- 38 39 high--

+4 -1

web/e2e_test.go

··· 560 560 // Returns 3 instead of 4 new line characters since we swallow 561 561 // the last new line in Before, Fragments and After. 562 562 Before: "\n\n\n", 563 - After: "\n\n\n", 563 + // Returns 2 instead of 3 new line characters since a 564 + // trailing newline at the end of the file does not 565 + // constitue a new line. 566 + After: "\n\n", 564 567 }, 565 568 }, 566 569 },

Configure Feed

Configure Feed