diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2020-07-20 12:26:25 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2020-07-20 12:26:25 +0000 |
commit | a09983ae35713f5a2bbb100981116d31ce99826e (patch) | |
tree | 2ee2af7bd104d57086db360a7e6d8c9d5d43667a /doc/development/elasticsearch.md | |
parent | 18c5ab32b738c0b6ecb4d0df3994000482f34bd8 (diff) | |
download | gitlab-ce-a09983ae35713f5a2bbb100981116d31ce99826e.tar.gz |
Add latest changes from gitlab-org/gitlab@13-2-stable-ee
Diffstat (limited to 'doc/development/elasticsearch.md')
-rw-r--r-- | doc/development/elasticsearch.md | 22 |
1 files changed, 15 insertions, 7 deletions
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md index 9f54386f1af..90debab3b5c 100644 --- a/doc/development/elasticsearch.md +++ b/doc/development/elasticsearch.md @@ -60,7 +60,7 @@ The `whitespace` tokenizer was selected in order to have more control over how t Please see the `code` filter for an explanation on how tokens are split. -NOTE: **Known Issues**: +NOTE: **Note:** Currently the [Elasticsearch code_analyzer doesn't account for all code cases](../integration/elasticsearch.md#known-issues). #### `code_search_analyzer` @@ -111,11 +111,8 @@ Patterns: - `'"((?:\\"|[^"]|\\")*)"'`: captures terms inside quotes, removing the quotes - `"'((?:\\'|[^']|\\')*)'"`: same as above, for single-quotes - `'\.([^.]+)(?=\.|\s|\Z)'`: separate terms with periods in-between -- `'\/?([^\/]+)(?=\/|\b)'`: separate path terms `like/this/one` - -#### `edgeNGram_filter` - -Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenfilter.html) to allow inputs with only parts of a token to find the token. For example it would turn `glasses` into permutations starting with `gl` and ending with `glasses`, which would allow a search for "`glass`" to find the original token `glasses` +- `'([\p{L}_.-]+)'`: some common chars in file names to keep the whole filename intact (eg. `my_file-ñame.txt`) +- `'([\p{L}\d_]+)'`: letters, numbers and underscores are the most common tokens in programming. Always capture them greedily regardless of context. ## Gotchas @@ -160,7 +157,8 @@ The global configurations per version are now in the `Elastic::(Version)::Config ### Creating new version of schema -NOTE: **Note:** this is not applicable yet as multiple indices functionality is not fully implemented. +NOTE: **Note:** +This is not applicable yet as multiple indices functionality is not fully implemented. Folders like `ee/lib/elastic/v12p1` contain snapshots of search logic from different versions. To keep a continuous Git history, the latest version lives under `ee/lib/elastic/latest`, but its classes are aliased under an actual version (e.g. `ee/lib/elastic/v12p3`). When referencing these classes, never use the `Latest` namespace directly, but use the actual version (e.g. `V12p3`). @@ -222,6 +220,16 @@ be used both locally in development and on any deployed GitLab instance to diagnose poor search performance. This will show the exact queries being made, which is useful to diagnose why a search might be slow. +### Correlation ID and X-Opaque-Id + +Our [correlation +ID](./distributed_tracing.md#developer-guidelines-for-working-with-correlation-ids) +is forwarded by all requests from Rails to Elasticsearch as the +[`X-Opaque-Id`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html#_identifying_running_tasks) +header which allows us to track any +[tasks](https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html) +in the cluster back the request in GitLab. + ## Troubleshooting ### Getting `flood stage disk watermark [95%] exceeded` |