fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

docs: some updates (#952)

Asked an agent to inspect the codebase and docs and update the docs to
match reality more. I ended up not using all of its suggestions, but
these are the ones that seem reasonable.

Test Plan: n/a

+62 -6
+10 -3
README.md
··· 78 78 go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver 79 79 $GOPATH/bin/zoekt-webserver -index ~/.zoekt/ 80 80 81 - This will start a web server with a simple search UI at http://localhost:6070. See the [uuery syntax docs](doc/query_syntax.md) 82 - for more details on the query language. 81 + This will start a web server with a simple search UI at http://localhost:6070. 82 + See the [query syntax docs](doc/query_syntax.md) for more details on the query 83 + language. 84 + 85 + If you start the web server with `-rpc`, it exposes a [simple JSON search 86 + API](doc/json-api.md) at `http://localhost:6070/api/search`. 83 87 84 - If you start the web server with `-rpc`, it exposes a [simple JSON search API](doc/json-api.md) at `http://localhost:6070/search/api/search. 88 + The JSON API supports advanced features including: 89 + - Streaming search results (using the `FlushWallTime` option) 90 + - Alternative BM25 scoring (using the `UseBM25Scoring` option) 91 + - Context lines around matches (using the `NumContextLines` option) 85 92 86 93 Finally, the web server exposes a gRPC API that supports [structured query objects](query/query.go) and advanced search options. 87 94
+8 -1
doc/design.md
··· 142 142 * branch masks 143 143 * metadata (repository name, index format version, etc.) 144 144 145 - In practice, the shard size is about 3x the corpus (size). 145 + In practice, the shard size is about 3.5x the corpus size, composed of 146 + original content, posting lists, and other metadata. 146 147 147 148 The format uses uint32 for all offsets, so the total size of a shard 148 149 should be below 4G. Given the size of the posting data, this caps ··· 178 179 For the latter, it is necessary to find symbol definitions and other 179 180 sections within files on indexing. Several (imperfect) programs to do 180 181 this already exist, eg. `ctags`. 182 + 183 + Zoekt also supports an alternative BM25-based scoring algorithm that can be 184 + enabled with `UseBM25Scoring`. When enabled, each match in a file is treated 185 + as a term, and an approximation to BM25 is computed. This is useful for 186 + multi-term queries, better handling of term frequency, and appropriate 187 + document length normalization. 181 188 182 189 183 190 Query language
+3 -1
doc/faq.md
··· 111 111 112 112 The search server should have local SSD to store the index file (which 113 113 is 3.5x the corpus size), and have at least 20% more RAM than the 114 - corpus size. 114 + corpus size. For optimal performance with large codebases, consider 115 + using machines with ample CPU cores, as search operations can be 116 + parallelized across shards. 115 117 116 118 ## Can I index multiple branches? 117 119
+41 -1
doc/query_syntax.md
··· 10 10 11 11 A query is made up of expressions. An **expression** can be: 12 12 - A negation (e.g., `-`), 13 - - A field (e.g., `repo:`). 13 + - A field (e.g., `repo:`), 14 14 - A grouping (e.g., parentheses `()`), 15 15 16 16 Logical `OR` operations combine multiple expressions. The **`AND` operator is implicit**, meaning multiple expressions written together will be automatically treated as `AND`. ··· 86 86 87 87 --- 88 88 89 + ## Special Query Types 90 + 91 + ### Filtering by Repository Type 92 + 93 + Zoekt supports filtering repositories by various attributes: 94 + 95 + ```plaintext 96 + public:yes archived:no fork:no 97 + ``` 98 + 99 + This finds repositories that are public, not archived, and not forks. 100 + 101 + ### Result Type Control 102 + 103 + The `type:` operator controls what kind of results are returned: 104 + 105 + ```plaintext 106 + type:repo content:config 107 + ``` 108 + 109 + This returns repository names instead of file matches. Valid values include: 110 + - `filematch` - Returns file content matches (default) 111 + - `filename` - Returns only matching filenames 112 + - `repo` - Returns only repository names 113 + 114 + --- 115 + 89 116 ## Special Query Values 90 117 91 118 - **Boolean Values**: ··· 108 135 ```plaintext 109 136 content:/foo.*bar/ 110 137 ``` 138 + 139 + --- 140 + 141 + ## Case Sensitivity 142 + 143 + Zoekt supports three case sensitivity modes: 144 + 145 + - `case:yes` - Exact case matching 146 + - `case:no` - Case-insensitive matching 147 + - `case:auto` - Automatically detect based on pattern (default) 148 + 149 + In auto mode, if the pattern contains uppercase letters, the search will be 150 + case-sensitive; otherwise, it will be case-insensitive. 111 151 112 152 --- 113 153