fork of https://github.com/sourcegraph/zoekt
0

Configure Feed

Select the types of activity you want to include in your feed.

Update README (#905)

This PR updates the README to clarify Zoekt's current design, and explain the main usage patterns.

+53 -111
+50 -110
README.md
··· 3 3 4 4 ("seek, and ye shall eat spinach" - My primary school teacher) 5 5 6 - This is a fast text search engine, intended for use with source 6 + Zoekt is a text search engine intended for use with source 7 7 code. (Pronunciation: roughly as you would pronounce "zooked" in English) 8 8 9 - **Note:** This is a [Sourcegraph](https://github.com/sourcegraph/zoekt) fork 10 - of [github.com/google/zoekt](https://github.com/google/zoekt). It is now the 11 - main maintained source of Zoekt. 12 - 13 - # INSTRUCTIONS 14 - 15 - ## Downloading 16 - 17 - go get github.com/sourcegraph/zoekt/ 9 + **Note:** This has been the maintained source for Zoekt since 2017, when it was forked from the 10 + original repository [github.com/google/zoekt](https://github.com/google/zoekt). 18 11 19 - ## Indexing 12 + ## Background 20 13 21 - ### Directory 14 + Zoekt supports fast substring and regexp matching on source code, with a rich query language 15 + that includes boolean operators (and, or, not). It can search individual repositories, and search 16 + across many repositories in a large codebase. Zoekt ranks search results using a combination of code-related signals 17 + like whether the match is on a symbol. Because of its general design based on trigram indexing and syntactic 18 + parsing, it works well for a variety of programming languages. 22 19 23 - go install github.com/sourcegraph/zoekt/cmd/zoekt-index 24 - $GOPATH/bin/zoekt-index . 20 + The two main ways to use the project are 21 + * Through individual commands, to index repositories and perform searches through Zoekt's [query language](doc/query_syntax.md) 22 + * Or, through the indexserver and webserver, which support syncing repositories from a code host and searching them through a web UI or API 25 23 26 - ### Git repository 24 + For more details on Zoekt's design, see the [docs directory](doc/). 27 25 28 - go install github.com/sourcegraph/zoekt/cmd/zoekt-git-index 29 - $GOPATH/bin/zoekt-git-index -branches master,stable-1.4 -prefix origin/ . 26 + ## Usage 30 27 31 - ### Repo repositories 28 + ### Installation 32 29 33 - go install github.com/sourcegraph/zoekt/cmd/zoekt-{repo-index,mirror-gitiles} 34 - zoekt-mirror-gitiles -dest ~/repos/ https://gfiber.googlesource.com 35 - zoekt-repo-index \ 36 - -name gfiber \ 37 - -base_url https://gfiber.googlesource.com/ \ 38 - -manifest_repo ~/repos/gfiber.googlesource.com/manifests.git \ 39 - -repo_cache ~/repos \ 40 - -manifest_rev_prefix=refs/heads/ --rev_prefix= \ 41 - master:default_unrestricted.xml 30 + go get github.com/sourcegraph/zoekt/ 42 31 43 - ## Searching 32 + **Note**: It is also recommended to install [Universal ctags](https://github.com/universal-ctags/ctags), as symbol 33 + information is a key signal in ranking search results. See [ctags.md](doc/ctags.md) for more information. 44 34 45 - ### Web interface 35 + ### Command-based usage 46 36 47 - go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver 48 - $GOPATH/bin/zoekt-webserver -listen :6070 37 + Zoekt supports indexing and searching repositories on the command line. This is most helpful 38 + for simple local usage, or for testing and development. 49 39 50 - ### JSON API 40 + #### Indexing a local git repo 51 41 52 - You can retrieve search results as JSON by sending a GET request to zoekt-webserver. 42 + go install github.com/sourcegraph/zoekt/cmd/zoekt-git-index 43 + $GOPATH/bin/zoekt-git-index -index ~/.zoekt /path/to/repo 53 44 54 - curl --get \ 55 - --url "http://localhost:6070/search" \ 56 - --data-urlencode "q=ngram f:READ" \ 57 - --data-urlencode "num=50" \ 58 - --data-urlencode "format=json" 45 + #### Indexing a local directory (not git-specific) 59 46 60 - The response data is a JSON object. You can refer to [web.ApiSearchResult](https://sourcegraph.com/github.com/sourcegraph/zoekt@6b1df4f8a3d7b34f13ba0cafd8e1a9b3fc728cf0/-/blob/web/api.go?L23:6&subtree=true) to learn about the structure of the object. 47 + go install github.com/sourcegraph/zoekt/cmd/zoekt-index 48 + $GOPATH/bin/zoekt-index -index ~/.zoekt /path/to/repo 61 49 62 - ### CLI 50 + #### Searching an index 63 51 64 52 go install github.com/sourcegraph/zoekt/cmd/zoekt 65 - $GOPATH/bin/zoekt 'ngram f:READ' 66 - 67 - ## Installation 68 - A more organized installation on a Linux server should use a systemd unit file, 69 - eg. 70 - 71 - [Unit] 72 - Description=zoekt webserver 73 - 74 - [Service] 75 - ExecStart=/zoekt/bin/zoekt-webserver -index /zoekt/index -listen :443 --ssl_cert /zoekt/etc/cert.pem --ssl_key /zoekt/etc/key.pem 76 - Restart=always 77 - 78 - [Install] 79 - WantedBy=default.target 80 - 53 + $GOPATH/bin/zoekt 'hello' 54 + $GOPATH/bin/zoekt 'hello file:README' 81 55 82 - # SEARCH SERVICE 56 + ### Zoekt services 83 57 84 - Zoekt comes with a small service management program: 58 + Zoekt also contains an index server and web server to support larger-scale indexing and searching 59 + of remote repositories. The index server can be configured to periodically fetch and reindex repositories 60 + from a code host. The webserver can be configured to serve search results through a web UI or API. 85 61 62 + #### Indexing a GitHub organization 63 + 86 64 go install github.com/sourcegraph/zoekt/cmd/zoekt-indexserver 87 65 88 - cat << EOF > config.json 89 - [{"GithubUser": "username"}, 90 - {"GithubOrg": "org"}, 91 - {"GitilesURL": "https://gerrit.googlesource.com", "Name": "zoekt" } 92 - ] 93 - EOF 66 + echo YOUR_GITHUB_TOKEN_HERE > token.txt 67 + echo '[{"GitHubOrg": "apache", "CredentialPath": "token.txt"}]' > config.json 94 68 95 - $GOPATH/bin/zoekt-indexserver -mirror_config config.json 96 - 97 - This will mirror all repos under 'github.com/username', 'github.com/org', as 98 - well as the 'zoekt' repository. It will index the repositories. 69 + $GOPATH/bin/zoekt-indexserver -mirror_config config.json -data_dir ~/.zoekt/ 99 70 100 - It takes care of fetching and indexing new data and cleaning up logfiles. 71 + This will fetch all repos under 'github.com/apache', then index the repositories. The indexserver takes care of 72 + periodically fetching and indexing new data, and cleaning up logfiles. See [config.go](cmd/zoekt-indexserver/config.go) 73 + for more details on this configuration. 101 74 102 - The webserver can be started from a standard service management framework, such 103 - as systemd. 75 + #### Starting the web server 104 76 77 + go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver 78 + $GOPATH/bin/zoekt-webserver -index ~/.zoekt/ 105 79 106 - # SYMBOL SEARCH 80 + This will start a web server with a simple search UI at http://localhost:6070. See the [uuery syntax docs](doc/query_syntax.md) 81 + for more details on the query language. 107 82 108 - It is recommended to install [Universal 109 - ctags](https://github.com/universal-ctags/ctags) to improve 110 - ranking. See [here](doc/ctags.md) for more information. 83 + If you start the web server with `-rpc`, it exposes a [simple JSON search API](doc/json-api.md) at `http://localhost:6070/search/api/search. 111 84 85 + Finally, the web server exposes a gRPC API that supports [structured query objects](query/query.go) and advanced search options. 112 86 113 - # ACKNOWLEDGEMENTS 87 + ## Acknowledgements 114 88 115 89 Thanks to Han-Wen Nienhuys for creating Zoekt. Thanks to Alexander Neubeck for 116 90 coming up with this idea, and helping Han-Wen Nienhuys flesh it out. 117 - 118 - 119 - # FORK DETAILS 120 - 121 - Originally this fork contained some changes that do not make sense to upstream 122 - and or have not yet been upstreamed. However, this is now the defacto source 123 - for Zoekt. This section will remain for historical reasons and contains 124 - outdated information. It can be removed once the dust settles on moving from 125 - google/zoekt to sourcegraph/zoekt. Differences: 126 - 127 - - [zoekt-sourcegraph-indexserver](cmd/zoekt-sourcegraph-indexserver/main.go) 128 - is a Sourcegraph specific command which indexes all enabled repositories on 129 - Sourcegraph, as well as keeping the indexes up to date. 130 - - We have exposed the API via 131 - [keegancsmith/rpc](https://github.com/keegancsmith/rpc) (a fork of `net/rpc` 132 - which supports cancellation). 133 - - Query primitive `BranchesRepos` to efficiently specify a set of repositories to 134 - search. 135 - - Allow empty shard directories on startup. Needed when starting a fresh 136 - instance which hasn't indexed anything yet. 137 - - We can return symbol/ctag data in results. Additionally we can run symbol regex queries. 138 - - We search shards in order of repo name and ignore shard ranking. 139 - - Other minor changes. 140 - 141 - Assuming you have the gerrit upstream configured, a useful way to see what we 142 - changed is: 143 - 144 - ``` shellsession 145 - $ git diff gerrit/master -- ':(exclude)vendor/' ':(exclude)Gopkg*' 146 - ``` 147 - 148 - # DISCLAIMER 149 - 150 - This is not an official Google product
+1 -1
api.go
··· 12 12 // See the License for the specific language governing permissions and 13 13 // limitations under the License. 14 14 15 - package zoekt // import "github.com/sourcegraph/zoekt" 15 + package zoekt 16 16 17 17 import ( 18 18 "context"
doc/api.md doc/json-api.md
+2
doc/query_syntax.md
··· 2 2 3 3 This guide explains the Zoekt query language, used for searching text within Git repositories. Zoekt queries allow combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. 4 4 5 + For a brief overview of Zoekt's query syntax, see [these great docs from neogrok](https://neogrok-demo-web.fly.dev/syntax). 6 + 5 7 --- 6 8 7 9 ## Syntax Overview