diff options
-rw-r--r-- | app/controllers/chaos_controller.rb | 13 | ||||
-rw-r--r-- | config/routes.rb | 10 | ||||
-rw-r--r-- | doc/development/chaos_endpoints.md | 83 | ||||
-rw-r--r-- | doc/development/performance.md | 1 |
4 files changed, 103 insertions, 4 deletions
diff --git a/app/controllers/chaos_controller.rb b/app/controllers/chaos_controller.rb index bdb99995532..6593b748130 100644 --- a/app/controllers/chaos_controller.rb +++ b/app/controllers/chaos_controller.rb @@ -1,6 +1,8 @@ # frozen_string_literal: true class ChaosController < ActionController::Base + before_action :validate_request + def leakmem memory_mb = params[:memory_mb] ? params[:memory_mb].to_i : 100 retainer = [] @@ -29,4 +31,15 @@ class ChaosController < ActionController::Base Process.kill("KILL", Process.pid) end + private + + def validate_request + secret = ENV['GITLAB_CHAOS_SECRET'] + return unless secret + + unless request.headers["HTTP_X_CHAOS_SECRET"] == secret + render text: "To experience chaos, please set X-Chaos-Secret header", content_type: 'text/plain', status: 401 + end + end + end diff --git a/config/routes.rb b/config/routes.rb index d214356f3e9..d4c19a03ff8 100644 --- a/config/routes.rb +++ b/config/routes.rb @@ -83,10 +83,12 @@ Rails.application.routes.draw do draw :operations draw :instance_statistics - get '/chaos/leakmem' => 'chaos#leakmem' - get '/chaos/cpuspin' => 'chaos#cpuspin' - get '/chaos/sleep' => 'chaos#sleep' - get '/chaos/kill' => 'chaos#kill' + if ENV['GITLAB_ENABLE_CHAOS_ENDPOINTS'] + get '/chaos/leakmem' => 'chaos#leakmem' + get '/chaos/cpuspin' => 'chaos#cpuspin' + get '/chaos/sleep' => 'chaos#sleep' + get '/chaos/kill' => 'chaos#kill' + end end draw :api diff --git a/doc/development/chaos_endpoints.md b/doc/development/chaos_endpoints.md new file mode 100644 index 00000000000..4546d1498c0 --- /dev/null +++ b/doc/development/chaos_endpoints.md @@ -0,0 +1,83 @@ +# Generating Chaos in a test GitLab instance + +As [Werner Vogels](https://twitter.com/Werner), the CTO at Amazon Web Services, famously put it, **Everything fails, all the time**. + +As a developer, it's as important to consider the failure modes in which your software will operate as much as normal operation. Doing so can mean the difference between a minor hiccup leading to a scattering of 500 errors experienced by a tiny fraction of users and a full site outage affect all users for an extended period. + +To paraphrase [Tolstoy](https://en.wikipedia.org/wiki/Anna_Karenina_principle), _all happy servers are alike, but all failing servers are failing in their own way_. Luckily, there are ways we can attempt to simulate these failure modes, and the chaos endpoints are tools for assisting in this process. + +Currently, there are four endpoints for simulating the following conditions: slow requests, cpu-bound requests, memory leaks and unexpected process crashes. + +## Enabling Chaos Endpoints + +For obvious reasons, these endpoints are not enabled by default. They can be enabled by setting the `GITLAB_ENABLE_CHAOS_ENDPOINTS` environment variable. + +For example, if you're using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) this can be done with the following command: + +```shell +GITLAB_ENABLE_CHAOS_ENDPOINTS=1 gdk run +``` + +### Securing the Chaos Endpoints + +**It is highly recommended that you secure access to the Chaos endpoints using a secret token**. This is recommended when enabling these endpoints locally, and essential when running in a staging or other shared environment. _It goes without saying that you should not enable them in production unless you absolutely know what you're doing._ + +A secret can be set through the `GITLAB_CHAOS_SECRET` environment variable. For example, when using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) this can be done with the following command line: + +```shell +GITLAB_ENABLE_CHAOS_ENDPOINTS=1 GITLAB_CHAOS_SECRET=secret gdk run +``` + +Replace `secret` with your own secret token. + +## Invoking Chaos + +Once you have enabled the chaos endpoints and restarted the application you can start testing using the endpoints. + +### Memory Leaks + +To simulate a memory leak in your application, use the `/-/chaos/leakmem` endpoint. + +For example, if your GitLab instance is listening at `localhost:3000`, you could `curl` the endpoint as follows: + +```shell +curl http://localhost:3000/-/chaos/leakmem?memory_mb=1024 -H 'X-Chaos-Secret: secret' +``` + +The `memory_mb` parameter tells the application how much memory it should leak. + +Note: the memory is not retained after the request, so once its completed, the Ruby garbage collector will attempt to recover the memory. + +### CPU Spin + +This endpoint attempts to fully utilise a single core, at 100%, for the given period. + +```shell +curl http://localhost:3000/-/chaos/cpuspin?duration_s=60 -H 'X-Chaos-Secret: secret' +``` + +The `duration_s` parameter will configure how long the core is utilised. + +Depending on your rack server setup, your request may timeout after a predermined period (normally 60 seconds). If you're using Unicorn, this is done by killing the worker process. + +### Sleep + +This endpoint is similar to the CPU Spin endpoint but simulates off-processor activity, such backend services of IO. It will sleep for a given duration. + +```shell +curl http://localhost:3000/-/chaos/sleep?duration_s=60 -H 'X-Chaos-Secret: secret' +``` + +The `duration_s` parameter will configure how long the request will sleep for. + +As with the CPU Spin endpoint, this may lead to your request timing out if duration exceeds the configured limit. + +### Kill + +This endpoint will simulate the unexpected death of a worker process using a `kill` signal. + +```shell +curl http://localhost:3000/-/chaos/kill -H 'X-Chaos-Secret: secret' +``` + +Note: since this endpoint uses the `KILL` signal, the worker is not given a chance to cleanup or shutdown. diff --git a/doc/development/performance.md b/doc/development/performance.md index c7b10dfd5ce..ec1ac2d49da 100644 --- a/doc/development/performance.md +++ b/doc/development/performance.md @@ -41,6 +41,7 @@ GitLab provides built-in tools to aid the process of improving performance: * [GitLab Performance Monitoring](../administration/monitoring/performance/index.md) * [Request Profiling](../administration/monitoring/performance/request_profiling.md) * [QueryRecoder](query_recorder.md) for preventing `N+1` regressions +* [Chaos Endpoints](chaos_endpoints.md) less for performance, more for availability: tools for testing failure scenarios GitLab employees can use GitLab.com's performance monitoring systems located at <https://dashboards.gitlab.net>, this requires you to log in using your |