diff options
Diffstat (limited to 'doc/administration/monitoring')
13 files changed, 601 insertions, 0 deletions
diff --git a/doc/administration/monitoring/health_check.md b/doc/administration/monitoring/health_check.md new file mode 100644 index 00000000000..0d17799372f --- /dev/null +++ b/doc/administration/monitoring/health_check.md @@ -0,0 +1,66 @@ +# Health Check + +>**Note:** This feature was [introduced][ce-3888] in GitLab 8.8. + +GitLab provides a health check endpoint for uptime monitoring on the `health_check` web +endpoint. The health check reports on the overall system status based on the status of +the database connection, the state of the database migrations, and the ability to write +and access the cache. This endpoint can be provided to uptime monitoring services like +[Pingdom][pingdom], [Nagios][nagios-health], and [NewRelic][newrelic-health]. + +## Access Token + +An access token needs to be provided while accessing the health check endpoint. The current +accepted token can be found on the `admin/health_check` page of your GitLab instance. + + + +The access token can be passed as a URL parameter: + +``` +https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN +``` + +or as an HTTP header: + +```bash +curl -H "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json +``` + +## Using the Endpoint + +Once you have the access token, health information can be retrieved as plain text, JSON, +or XML using the `health_check` endpoint: + +- `https://gitlab.example.com/health_check?token=ACCESS_TOKEN` +- `https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN` +- `https://gitlab.example.com/health_check.xml?token=ACCESS_TOKEN` + +You can also ask for the status of specific services: + +- `https://gitlab.example.com/health_check/cache.json?token=ACCESS_TOKEN` +- `https://gitlab.example.com/health_check/database.json?token=ACCESS_TOKEN` +- `https://gitlab.example.com/health_check/migrations.json?token=ACCESS_TOKEN` + +For example, the JSON output of the following health check: + +```bash +curl -H "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json +``` + +would be like: + +``` +{"healthy":true,"message":"success"} +``` + +## Status + +On failure, the endpoint will return a `500` HTTP status code. On success, the endpoint +will return a valid successful HTTP status code, and a `success` message. Ideally your +uptime monitoring should look for the success message. + +[ce-3888]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/3888 +[pingdom]: https://www.pingdom.com +[nagios-health]: https://nagios-plugins.org/doc/man/check_http.html +[newrelic-health]: https://docs.newrelic.com/docs/alerts/alert-policies/downtime-alerts/availability-monitoring diff --git a/doc/administration/monitoring/img/health_check_token.png b/doc/administration/monitoring/img/health_check_token.png Binary files differnew file mode 100644 index 00000000000..2daf8606b00 --- /dev/null +++ b/doc/administration/monitoring/img/health_check_token.png diff --git a/doc/administration/monitoring/performance/gitlab_configuration.md b/doc/administration/monitoring/performance/gitlab_configuration.md new file mode 100644 index 00000000000..771584268d9 --- /dev/null +++ b/doc/administration/monitoring/performance/gitlab_configuration.md @@ -0,0 +1,40 @@ +# GitLab Configuration + +GitLab Performance Monitoring is disabled by default. To enable it and change any of its +settings, navigate to the Admin area in **Settings > Metrics** +(`/admin/application_settings`). + +The minimum required settings you need to set are the InfluxDB host and port. +Make sure _Enable InfluxDB Metrics_ is checked and hit **Save** to save the +changes. + +--- + + + +--- + +Finally, a restart of all GitLab processes is required for the changes to take +effect: + +```bash +# For Omnibus installations +sudo gitlab-ctl restart + +# For installations from source +sudo service gitlab restart +``` + +## Pending Migrations + +When any migrations are pending, the metrics are disabled until the migrations +have been performed. + +--- + +Read more on: + +- [Introduction to GitLab Performance Monitoring](introduction.md) +- [InfluxDB Configuration](influxdb_configuration.md) +- [InfluxDB Schema](influxdb_schema.md) +- [Grafana Install/Configuration](grafana_configuration.md) diff --git a/doc/administration/monitoring/performance/grafana_configuration.md b/doc/administration/monitoring/performance/grafana_configuration.md new file mode 100644 index 00000000000..168bd85c26a --- /dev/null +++ b/doc/administration/monitoring/performance/grafana_configuration.md @@ -0,0 +1,149 @@ +# Grafana Configuration + +[Grafana](http://grafana.org/) is a tool that allows you to visualize time +series metrics through graphs and dashboards. It supports several backend +data stores, including InfluxDB. GitLab writes performance data to InfluxDB +and Grafana will allow you to query InfluxDB to display useful graphs. + +For the easiest installation and configuration, install Grafana on the same +server as InfluxDB. For larger installations, you may want to split out these +services. + +## Installation + +Grafana supplies package repositories (Yum/Apt) for easy installation. +See [Grafana installation documentation](http://docs.grafana.org/installation/) +for detailed steps. + +> **Note**: Before starting Grafana for the first time, set the admin user +and password in `/etc/grafana/grafana.ini`. Otherwise, the default password +will be `admin`. + +## Configuration + +Login as the admin user. Expand the menu by clicking the Grafana logo in the +top left corner. Choose 'Data Sources' from the menu. Then, click 'Add new' +in the top bar. + + + +Fill in the configuration details for the InfluxDB data source. Save and +Test Connection to ensure the configuration is correct. + +- **Name**: InfluxDB +- **Default**: Checked +- **Type**: InfluxDB 0.9.x (Even if you're using InfluxDB 0.10.x) +- **Url**: https://localhost:8086 (Or the remote URL if you've installed InfluxDB +on a separate server) +- **Access**: proxy +- **Database**: gitlab +- **User**: admin (Or the username configured when setting up InfluxDB) +- **Password**: The password configured when you set up InfluxDB + + + +## Apply retention policies and create continuous queries + +If you intend to import the GitLab provided Grafana dashboards, you will need +to copy and run a set of queries against InfluxDB to create the needed data +sets. + +On the InfluxDB server, run the following command, substituting your InfluxDB +user and password: + +```bash +influxdb --username admin -password super_secret +``` + +This will drop you in to an InfluxDB interactive session. Copy the entire +contents below and paste it in to the interactive session: + +``` +CREATE RETENTION POLICY default ON gitlab DURATION 1h REPLICATION 1 DEFAULT +CREATE RETENTION POLICY downsampled ON gitlab DURATION 7d REPLICATION 1 +CREATE CONTINUOUS QUERY grape_git_timings_per_action ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.grape_git_timings_per_action FROM gitlab."default".rails_method_calls WHERE (action !~ /.+/ OR action =~ /^Grape#/) AND method =~ /^(Rugged|Gitlab::Git)/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY grape_markdown_render_timings_overall ON gitlab BEGIN SELECT mean(banzai_cached_render_real_time) AS cached_real_mean, percentile(banzai_cached_render_real_time, 95) AS cached_real_95th, percentile(banzai_cached_render_real_time, 99) AS cached_real_99th, mean(banzai_cached_render_cpu_time) AS cached_cpu_mean, percentile(banzai_cached_render_cpu_time, 95) AS cached_cpu_95th, percentile(banzai_cached_render_cpu_time, 99) AS cached_cpu_99th, sum(banzai_cached_render_call_count) AS cached_call_count, mean(banzai_cacheless_render_real_time) AS cacheless_real_mean, percentile(banzai_cacheless_render_real_time, 95) AS cacheless_real_95th, percentile(banzai_cacheless_render_real_time, 99) AS cacheless_real_99th, mean(banzai_cacheless_render_cpu_time) AS cacheless_cpu_mean, percentile(banzai_cacheless_render_cpu_time, 95) AS cacheless_cpu_95th, percentile(banzai_cacheless_render_cpu_time, 99) AS cacheless_cpu_99th, sum(banzai_cacheless_render_call_count) AS cacheless_call_count INTO gitlab.downsampled.grape_markdown_render_timings_overall FROM gitlab."default".rails_transactions WHERE (action !~ /.+/ OR action =~ /^Grape#/) AND (banzai_cached_render_call_count > 0 OR banzai_cacheless_render_call_count > 0) GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY grape_markdown_render_timings_per_action ON gitlab BEGIN SELECT mean(banzai_cached_render_real_time) AS cached_real_mean, percentile(banzai_cached_render_real_time, 95) AS cached_real_95th, percentile(banzai_cached_render_real_time, 99) AS cached_real_99th, mean(banzai_cached_render_cpu_time) AS cached_cpu_mean, percentile(banzai_cached_render_cpu_time, 95) AS cached_cpu_95th, percentile(banzai_cached_render_cpu_time, 99) AS cached_cpu_99th, sum(banzai_cached_render_call_count) AS cached_call_count, mean(banzai_cacheless_render_real_time) AS cacheless_real_mean, percentile(banzai_cacheless_render_real_time, 95) AS cacheless_real_95th, percentile(banzai_cacheless_render_real_time, 99) AS cacheless_real_99th, mean(banzai_cacheless_render_cpu_time) AS cacheless_cpu_mean, percentile(banzai_cacheless_render_cpu_time, 95) AS cacheless_cpu_95th, percentile(banzai_cacheless_render_cpu_time, 99) AS cacheless_cpu_99th, sum(banzai_cacheless_render_call_count) AS cacheless_call_count INTO gitlab.downsampled.grape_markdown_render_timings_per_action FROM gitlab."default".rails_transactions WHERE (action !~ /.+/ OR action =~ /^Grape#/) AND (banzai_cached_render_call_count > 0 OR banzai_cacheless_render_call_count > 0) GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY grape_markdown_timings_overall ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.grape_markdown_timings_overall FROM gitlab."default".rails_method_calls WHERE (action !~ /.+/ OR action =~ /^Grape#/) AND method =~ /^Banzai/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY grape_method_call_timings_per_action_and_method ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.grape_method_call_timings_per_action_and_method FROM gitlab."default".rails_method_calls WHERE action !~ /.+/ OR action =~ /^Grape#/ GROUP BY time(1m), method, action END; +CREATE CONTINUOUS QUERY grape_method_call_timings_per_method ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.grape_method_call_timings_per_method FROM gitlab."default".rails_method_calls WHERE action !~ /.+/ OR action =~ /^Grape#/ GROUP BY time(1m), method END; +CREATE CONTINUOUS QUERY grape_transaction_counts_overall ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.grape_transaction_counts_overall FROM gitlab."default".rails_transactions WHERE action !~ /.+/ OR action =~ /^Grape#/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY grape_transaction_counts_per_action ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.grape_transaction_counts_per_action FROM gitlab."default".rails_transactions WHERE action !~ /.+/ OR action =~ /^Grape#/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY grape_transaction_timings_overall ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th, mean(sql_duration) AS sql_duration_mean, percentile(sql_duration, 95) AS sql_duration_95th, percentile(sql_duration, 99) AS sql_duration_99th, mean(view_duration) AS view_duration_mean, percentile(view_duration, 95) AS view_duration_95th, percentile(view_duration, 99) AS view_duration_99th, mean(cache_read_duration) AS cache_read_duration_mean, percentile(cache_read_duration, 99) AS cache_read_duration_99th, percentile(cache_read_duration, 95) AS cache_read_duration_95th, mean(cache_write_duration) AS cache_write_duration_mean, percentile(cache_write_duration, 99) AS cache_write_duration_99th, percentile(cache_write_duration, 95) AS cache_write_duration_95th, mean(cache_delete_duration) AS cache_delete_duration_mean, percentile(cache_delete_duration, 99) AS cache_delete_duration_99th, percentile(cache_delete_duration, 95) AS cache_delete_duration_95th, mean(cache_exists_duration) AS cache_exists_duration_mean, percentile(cache_exists_duration, 99) AS cache_exists_duration_99th, percentile(cache_exists_duration, 95) AS cache_exists_duration_95th, mean(cache_duration) AS cache_duration_mean, percentile(cache_duration, 99) AS cache_duration_99th, percentile(cache_duration, 95) AS cache_duration_95th, mean(method_duration) AS method_duration_mean, percentile(method_duration, 99) AS method_duration_99th, percentile(method_duration, 95) AS method_duration_95th INTO gitlab.downsampled.grape_transaction_timings_overall FROM gitlab."default".rails_transactions WHERE action !~ /.+/ OR action =~ /^Grape#/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY grape_transaction_timings_per_action ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th, mean(sql_duration) AS sql_duration_mean, percentile(sql_duration, 95) AS sql_duration_95th, percentile(sql_duration, 99) AS sql_duration_99th, mean(view_duration) AS view_duration_mean, percentile(view_duration, 95) AS view_duration_95th, percentile(view_duration, 99) AS view_duration_99th, mean(cache_read_duration) AS cache_read_duration_mean, percentile(cache_read_duration, 99) AS cache_read_duration_99th, percentile(cache_read_duration, 95) AS cache_read_duration_95th, mean(cache_write_duration) AS cache_write_duration_mean, percentile(cache_write_duration, 99) AS cache_write_duration_99th, percentile(cache_write_duration, 95) AS cache_write_duration_95th, mean(cache_delete_duration) AS cache_delete_duration_mean, percentile(cache_delete_duration, 99) AS cache_delete_duration_99th, percentile(cache_delete_duration, 95) AS cache_delete_duration_95th, mean(cache_exists_duration) AS cache_exists_duration_mean, percentile(cache_exists_duration, 99) AS cache_exists_duration_99th, percentile(cache_exists_duration, 95) AS cache_exists_duration_95th, mean(cache_duration) AS cache_duration_mean, percentile(cache_duration, 99) AS cache_duration_99th, percentile(cache_duration, 95) AS cache_duration_95th, mean(method_duration) AS method_duration_mean, percentile(method_duration, 99) AS method_duration_99th, percentile(method_duration, 95) AS method_duration_95th INTO gitlab.downsampled.grape_transaction_timings_per_action FROM gitlab."default".rails_transactions WHERE action !~ /.+/ OR action =~ /^Grape#/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY rails_file_descriptor_counts ON gitlab BEGIN SELECT sum(value) AS count INTO gitlab.downsampled.rails_file_descriptor_counts FROM gitlab."default".rails_file_descriptors GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_gc_counts ON gitlab BEGIN SELECT sum(count) AS total, sum(minor_gc_count) AS minor, sum(major_gc_count) AS major INTO gitlab.downsampled.rails_gc_counts FROM gitlab."default".rails_gc_statistics GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_gc_timings ON gitlab BEGIN SELECT mean(total_time) AS duration_mean, percentile(total_time, 95) AS duration_95th, percentile(total_time, 99) AS duration_99th INTO gitlab.downsampled.rails_gc_timings FROM gitlab."default".rails_gc_statistics GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_git_timings_per_action ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.rails_git_timings_per_action FROM gitlab."default".rails_method_calls WHERE (action =~ /.+/ AND action !~ /^Grape#/) AND method =~ /^(Rugged|Gitlab::Git)/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY rails_markdown_render_timings_overall ON gitlab BEGIN SELECT mean(banzai_cached_render_real_time) AS cached_real_mean, percentile(banzai_cached_render_real_time, 95) AS cached_real_95th, percentile(banzai_cached_render_real_time, 99) AS cached_real_99th, mean(banzai_cached_render_cpu_time) AS cached_cpu_mean, percentile(banzai_cached_render_cpu_time, 95) AS cached_cpu_95th, percentile(banzai_cached_render_cpu_time, 99) AS cached_cpu_99th, sum(banzai_cached_render_call_count) AS cached_call_count, mean(banzai_cacheless_render_real_time) AS cacheless_real_mean, percentile(banzai_cacheless_render_real_time, 95) AS cacheless_real_95th, percentile(banzai_cacheless_render_real_time, 99) AS cacheless_real_99th, mean(banzai_cacheless_render_cpu_time) AS cacheless_cpu_mean, percentile(banzai_cacheless_render_cpu_time, 95) AS cacheless_cpu_95th, percentile(banzai_cacheless_render_cpu_time, 99) AS cacheless_cpu_99th, sum(banzai_cacheless_render_call_count) AS cacheless_call_count INTO gitlab.downsampled.rails_markdown_render_timings_overall FROM gitlab."default".rails_transactions WHERE (action =~ /.+/ AND action !~ /^Grape#/) AND (banzai_cached_render_call_count > 0 OR banzai_cacheless_render_call_count > 0) GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_markdown_render_timings_per_action ON gitlab BEGIN SELECT mean(banzai_cached_render_real_time) AS cached_real_mean, percentile(banzai_cached_render_real_time, 95) AS cached_real_95th, percentile(banzai_cached_render_real_time, 99) AS cached_real_99th, mean(banzai_cached_render_cpu_time) AS cached_cpu_mean, percentile(banzai_cached_render_cpu_time, 95) AS cached_cpu_95th, percentile(banzai_cached_render_cpu_time, 99) AS cached_cpu_99th, sum(banzai_cached_render_call_count) AS cached_call_count, mean(banzai_cacheless_render_real_time) AS cacheless_real_mean, percentile(banzai_cacheless_render_real_time, 95) AS cacheless_real_95th, percentile(banzai_cacheless_render_real_time, 99) AS cacheless_real_99th, mean(banzai_cacheless_render_cpu_time) AS cacheless_cpu_mean, percentile(banzai_cacheless_render_cpu_time, 95) AS cacheless_cpu_95th, percentile(banzai_cacheless_render_cpu_time, 99) AS cacheless_cpu_99th, sum(banzai_cacheless_render_call_count) AS cacheless_call_count INTO gitlab.downsampled.rails_markdown_render_timings_per_action FROM gitlab."default".rails_transactions WHERE (action =~ /.+/ AND action !~ /^Grape#/) AND (banzai_cached_render_call_count > 0 OR banzai_cacheless_render_call_count > 0) GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY rails_markdown_timings_overall ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.rails_markdown_timings_overall FROM gitlab."default".rails_method_calls WHERE (action =~ /.+/ AND action !~ /^Grape#/) AND method =~ /^Banzai/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_memory_usage_overall ON gitlab BEGIN SELECT mean(value) AS memory_mean, percentile(value, 95) AS memory_95th, percentile(value, 99) AS memory_99th INTO gitlab.downsampled.rails_memory_usage_overall FROM gitlab."default".rails_memory_usage GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_method_call_timings_per_action_and_method ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.rails_method_call_timings_per_action_and_method FROM gitlab."default".rails_method_calls WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m), method, action END; +CREATE CONTINUOUS QUERY rails_method_call_timings_per_method ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.rails_method_call_timings_per_method FROM gitlab."default".rails_method_calls WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m), method END; +CREATE CONTINUOUS QUERY rails_object_counts_overall ON gitlab BEGIN SELECT sum(count) AS count INTO gitlab.downsampled.rails_object_counts_overall FROM gitlab."default".rails_object_counts GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_object_counts_per_type ON gitlab BEGIN SELECT sum(count) AS count INTO gitlab.downsampled.rails_object_counts_per_type FROM gitlab."default".rails_object_counts GROUP BY time(1m), type END; +CREATE CONTINUOUS QUERY rails_transaction_counts_overall ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.rails_transaction_counts_overall FROM gitlab."default".rails_transactions WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_transaction_counts_per_action ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.rails_transaction_counts_per_action FROM gitlab."default".rails_transactions WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY rails_transaction_timings_overall ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th, mean(sql_duration) AS sql_duration_mean, percentile(sql_duration, 95) AS sql_duration_95th, percentile(sql_duration, 99) AS sql_duration_99th, mean(view_duration) AS view_duration_mean, percentile(view_duration, 95) AS view_duration_95th, percentile(view_duration, 99) AS view_duration_99th, mean(cache_read_duration) AS cache_read_duration_mean, percentile(cache_read_duration, 99) AS cache_read_duration_99th, percentile(cache_read_duration, 95) AS cache_read_duration_95th, mean(cache_write_duration) AS cache_write_duration_mean, percentile(cache_write_duration, 99) AS cache_write_duration_99th, percentile(cache_write_duration, 95) AS cache_write_duration_95th, mean(cache_delete_duration) AS cache_delete_duration_mean, percentile(cache_delete_duration, 99) AS cache_delete_duration_99th, percentile(cache_delete_duration, 95) AS cache_delete_duration_95th, mean(cache_exists_duration) AS cache_exists_duration_mean, percentile(cache_exists_duration, 99) AS cache_exists_duration_99th, percentile(cache_exists_duration, 95) AS cache_exists_duration_95th, mean(cache_duration) AS cache_duration_mean, percentile(cache_duration, 99) AS cache_duration_99th, percentile(cache_duration, 95) AS cache_duration_95th, mean(method_duration) AS method_duration_mean, percentile(method_duration, 99) AS method_duration_99th, percentile(method_duration, 95) AS method_duration_95th INTO gitlab.downsampled.rails_transaction_timings_overall FROM gitlab."default".rails_transactions WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY rails_transaction_timings_per_action ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th, mean(sql_duration) AS sql_duration_mean, percentile(sql_duration, 95) AS sql_duration_95th, percentile(sql_duration, 99) AS sql_duration_99th, mean(view_duration) AS view_duration_mean, percentile(view_duration, 95) AS view_duration_95th, percentile(view_duration, 99) AS view_duration_99th, mean(cache_read_duration) AS cache_read_duration_mean, percentile(cache_read_duration, 99) AS cache_read_duration_99th, percentile(cache_read_duration, 95) AS cache_read_duration_95th, mean(cache_write_duration) AS cache_write_duration_mean, percentile(cache_write_duration, 99) AS cache_write_duration_99th, percentile(cache_write_duration, 95) AS cache_write_duration_95th, mean(cache_delete_duration) AS cache_delete_duration_mean, percentile(cache_delete_duration, 99) AS cache_delete_duration_99th, percentile(cache_delete_duration, 95) AS cache_delete_duration_95th, mean(cache_exists_duration) AS cache_exists_duration_mean, percentile(cache_exists_duration, 99) AS cache_exists_duration_99th, percentile(cache_exists_duration, 95) AS cache_exists_duration_95th, mean(cache_duration) AS cache_duration_mean, percentile(cache_duration, 99) AS cache_duration_99th, percentile(cache_duration, 95) AS cache_duration_95th, mean(method_duration) AS method_duration_mean, percentile(method_duration, 99) AS method_duration_99th, percentile(method_duration, 95) AS method_duration_95th INTO gitlab.downsampled.rails_transaction_timings_per_action FROM gitlab."default".rails_transactions WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY rails_view_timings_per_action_and_view ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.rails_view_timings_per_action_and_view FROM gitlab."default".rails_views WHERE action =~ /.+/ AND action !~ /^Grape#/ GROUP BY time(1m), action, view END; +CREATE CONTINUOUS QUERY sidekiq_file_descriptor_counts ON gitlab BEGIN SELECT sum(value) AS count INTO gitlab.downsampled.sidekiq_file_descriptor_counts FROM gitlab."default".sidekiq_file_descriptors GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_gc_counts ON gitlab BEGIN SELECT sum(count) AS total, sum(minor_gc_count) AS minor, sum(major_gc_count) AS major INTO gitlab.downsampled.sidekiq_gc_counts FROM gitlab."default".sidekiq_gc_statistics GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_gc_timings ON gitlab BEGIN SELECT mean(total_time) AS duration_mean, percentile(total_time, 95) AS duration_95th, percentile(total_time, 99) AS duration_99th INTO gitlab.downsampled.sidekiq_gc_timings FROM gitlab."default".sidekiq_gc_statistics GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_git_timings_per_action ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.sidekiq_git_timings_per_action FROM gitlab."default".sidekiq_method_calls WHERE method =~ /^(Rugged|Gitlab::Git)/ GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY sidekiq_markdown_render_timings_overall ON gitlab BEGIN SELECT mean(banzai_cached_render_real_time) AS cached_real_mean, percentile(banzai_cached_render_real_time, 95) AS cached_real_95th, percentile(banzai_cached_render_real_time, 99) AS cached_real_99th, mean(banzai_cached_render_cpu_time) AS cached_cpu_mean, percentile(banzai_cached_render_cpu_time, 95) AS cached_cpu_95th, percentile(banzai_cached_render_cpu_time, 99) AS cached_cpu_99th, sum(banzai_cached_render_call_count) AS cached_call_count, mean(banzai_cacheless_render_real_time) AS cacheless_real_mean, percentile(banzai_cacheless_render_real_time, 95) AS cacheless_real_95th, percentile(banzai_cacheless_render_real_time, 99) AS cacheless_real_99th, mean(banzai_cacheless_render_cpu_time) AS cacheless_cpu_mean, percentile(banzai_cacheless_render_cpu_time, 95) AS cacheless_cpu_95th, percentile(banzai_cacheless_render_cpu_time, 99) AS cacheless_cpu_99th, sum(banzai_cacheless_render_call_count) AS cacheless_call_count INTO gitlab.downsampled.sidekiq_markdown_render_timings_overall FROM gitlab."default".sidekiq_transactions WHERE (banzai_cached_render_call_count > 0 OR banzai_cacheless_render_call_count > 0) GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_markdown_render_timings_per_action ON gitlab BEGIN SELECT mean(banzai_cached_render_real_time) AS cached_real_mean, percentile(banzai_cached_render_real_time, 95) AS cached_real_95th, percentile(banzai_cached_render_real_time, 99) AS cached_real_99th, mean(banzai_cached_render_cpu_time) AS cached_cpu_mean, percentile(banzai_cached_render_cpu_time, 95) AS cached_cpu_95th, percentile(banzai_cached_render_cpu_time, 99) AS cached_cpu_99th, sum(banzai_cached_render_call_count) AS cached_call_count, mean(banzai_cacheless_render_real_time) AS cacheless_real_mean, percentile(banzai_cacheless_render_real_time, 95) AS cacheless_real_95th, percentile(banzai_cacheless_render_real_time, 99) AS cacheless_real_99th, mean(banzai_cacheless_render_cpu_time) AS cacheless_cpu_mean, percentile(banzai_cacheless_render_cpu_time, 95) AS cacheless_cpu_95th, percentile(banzai_cacheless_render_cpu_time, 99) AS cacheless_cpu_99th, sum(banzai_cacheless_render_call_count) AS cacheless_call_count INTO gitlab.downsampled.sidekiq_markdown_render_timings_per_action FROM gitlab."default".sidekiq_transactions WHERE (banzai_cached_render_call_count > 0 OR banzai_cacheless_render_call_count > 0) GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY sidekiq_markdown_timings_overall ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.sidekiq_markdown_timings_overall FROM gitlab."default".sidekiq_method_calls WHERE method =~ /^Banzai/ GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_memory_usage_overall ON gitlab BEGIN SELECT mean(value) AS memory_mean, percentile(value, 95) AS memory_95th, percentile(value, 99) AS memory_99th INTO gitlab.downsampled.sidekiq_memory_usage_overall FROM gitlab."default".sidekiq_memory_usage GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_method_call_timings_per_action_and_method ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.sidekiq_method_call_timings_per_action_and_method FROM gitlab."default".sidekiq_method_calls GROUP BY time(1m), method, action END; +CREATE CONTINUOUS QUERY sidekiq_method_call_timings_per_method ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.sidekiq_method_call_timings_per_method FROM gitlab."default".sidekiq_method_calls GROUP BY time(1m), method END; +CREATE CONTINUOUS QUERY sidekiq_object_counts_overall ON gitlab BEGIN SELECT sum(count) AS count INTO gitlab.downsampled.sidekiq_object_counts_overall FROM gitlab."default".sidekiq_object_counts GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_object_counts_per_type ON gitlab BEGIN SELECT sum(count) AS count INTO gitlab.downsampled.sidekiq_object_counts_per_type FROM gitlab."default".sidekiq_object_counts GROUP BY time(1m), type END; +CREATE CONTINUOUS QUERY sidekiq_transaction_counts_overall ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.sidekiq_transaction_counts_overall FROM gitlab."default".sidekiq_transactions GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_transaction_counts_per_action ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.sidekiq_transaction_counts_per_action FROM gitlab."default".sidekiq_transactions GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY sidekiq_transaction_timings_overall ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th, mean(sql_duration) AS sql_duration_mean, percentile(sql_duration, 95) AS sql_duration_95th, percentile(sql_duration, 99) AS sql_duration_99th, mean(view_duration) AS view_duration_mean, percentile(view_duration, 95) AS view_duration_95th, percentile(view_duration, 99) AS view_duration_99th, mean(cache_read_duration) AS cache_read_duration_mean, percentile(cache_read_duration, 99) AS cache_read_duration_99th, percentile(cache_read_duration, 95) AS cache_read_duration_95th, mean(cache_write_duration) AS cache_write_duration_mean, percentile(cache_write_duration, 99) AS cache_write_duration_99th, percentile(cache_write_duration, 95) AS cache_write_duration_95th, mean(cache_delete_duration) AS cache_delete_duration_mean, percentile(cache_delete_duration, 99) AS cache_delete_duration_99th, percentile(cache_delete_duration, 95) AS cache_delete_duration_95th, mean(cache_exists_duration) AS cache_exists_duration_mean, percentile(cache_exists_duration, 99) AS cache_exists_duration_99th, percentile(cache_exists_duration, 95) AS cache_exists_duration_95th, mean(cache_duration) AS cache_duration_mean, percentile(cache_duration, 99) AS cache_duration_99th, percentile(cache_duration, 95) AS cache_duration_95th, mean(method_duration) AS method_duration_mean, percentile(method_duration, 99) AS method_duration_99th, percentile(method_duration, 95) AS method_duration_95th INTO gitlab.downsampled.sidekiq_transaction_timings_overall FROM gitlab."default".sidekiq_transactions GROUP BY time(1m) END; +CREATE CONTINUOUS QUERY sidekiq_transaction_timings_per_action ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th, mean(sql_duration) AS sql_duration_mean, percentile(sql_duration, 95) AS sql_duration_95th, percentile(sql_duration, 99) AS sql_duration_99th, mean(view_duration) AS view_duration_mean, percentile(view_duration, 95) AS view_duration_95th, percentile(view_duration, 99) AS view_duration_99th, mean(cache_read_duration) AS cache_read_duration_mean, percentile(cache_read_duration, 99) AS cache_read_duration_99th, percentile(cache_read_duration, 95) AS cache_read_duration_95th, mean(cache_write_duration) AS cache_write_duration_mean, percentile(cache_write_duration, 99) AS cache_write_duration_99th, percentile(cache_write_duration, 95) AS cache_write_duration_95th, mean(cache_delete_duration) AS cache_delete_duration_mean, percentile(cache_delete_duration, 99) AS cache_delete_duration_99th, percentile(cache_delete_duration, 95) AS cache_delete_duration_95th, mean(cache_exists_duration) AS cache_exists_duration_mean, percentile(cache_exists_duration, 99) AS cache_exists_duration_99th, percentile(cache_exists_duration, 95) AS cache_exists_duration_95th, mean(cache_duration) AS cache_duration_mean, percentile(cache_duration, 99) AS cache_duration_99th, percentile(cache_duration, 95) AS cache_duration_95th, mean(method_duration) AS method_duration_mean, percentile(method_duration, 99) AS method_duration_99th, percentile(method_duration, 95) AS method_duration_95th INTO gitlab.downsampled.sidekiq_transaction_timings_per_action FROM gitlab."default".sidekiq_transactions GROUP BY time(1m), action END; +CREATE CONTINUOUS QUERY sidekiq_view_timings_per_action_and_view ON gitlab BEGIN SELECT mean("duration") AS duration_mean, percentile("duration", 95) AS duration_95th, percentile("duration", 99) AS duration_99th INTO gitlab.downsampled.sidekiq_view_timings_per_action_and_view FROM gitlab."default".sidekiq_views GROUP BY time(1m), action, view END; +CREATE CONTINUOUS QUERY web_transaction_counts_overall ON gitlab BEGIN SELECT count("duration") AS count INTO gitlab.downsampled.web_transaction_counts_overall FROM gitlab."default".rails_transactions GROUP BY time(1m) END; +``` + +## Import Dashboards + +You can now import a set of default dashboards that will give you a good +start on displaying useful information. GitLab has published a set of default +[Grafana dashboards][grafana-dashboards] to get you started. Clone the +repository or download a zip/tarball, then follow these steps to import each +JSON file. + +Open the dashboard dropdown menu and click 'Import' + + + +Click 'Choose file' and browse to the location where you downloaded or cloned +the dashboard repository. Pick one of the JSON files to import. + + + +Once the dashboard is imported, be sure to click save icon in the top bar. If +you do not save the dashboard after importing it will be removed when you +navigate away. + + + +Repeat this process for each dashboard you wish to import. + +Alternatively you can automatically import all the dashboards into your Grafana +instance. See the README of the [Grafana dashboards][grafana-dashboards] +repository for more information on this process. + +[grafana-dashboards]: https://gitlab.com/gitlab-org/grafana-dashboards + +--- + +Read more on: + +- [Introduction to GitLab Performance Monitoring](introduction.md) +- [GitLab Configuration](gitlab_configuration.md) +- [InfluxDB Installation/Configuration](influxdb_configuration.md) +- [InfluxDB Schema](influxdb_schema.md) diff --git a/doc/administration/monitoring/performance/img/grafana_dashboard_dropdown.png b/doc/administration/monitoring/performance/img/grafana_dashboard_dropdown.png Binary files differnew file mode 100644 index 00000000000..b4448c7a09f --- /dev/null +++ b/doc/administration/monitoring/performance/img/grafana_dashboard_dropdown.png diff --git a/doc/administration/monitoring/performance/img/grafana_dashboard_import.png b/doc/administration/monitoring/performance/img/grafana_dashboard_import.png Binary files differnew file mode 100644 index 00000000000..5a2d3c0937a --- /dev/null +++ b/doc/administration/monitoring/performance/img/grafana_dashboard_import.png diff --git a/doc/administration/monitoring/performance/img/grafana_data_source_configuration.png b/doc/administration/monitoring/performance/img/grafana_data_source_configuration.png Binary files differnew file mode 100644 index 00000000000..7e2e111f570 --- /dev/null +++ b/doc/administration/monitoring/performance/img/grafana_data_source_configuration.png diff --git a/doc/administration/monitoring/performance/img/grafana_data_source_empty.png b/doc/administration/monitoring/performance/img/grafana_data_source_empty.png Binary files differnew file mode 100644 index 00000000000..11e27571e64 --- /dev/null +++ b/doc/administration/monitoring/performance/img/grafana_data_source_empty.png diff --git a/doc/administration/monitoring/performance/img/grafana_save_icon.png b/doc/administration/monitoring/performance/img/grafana_save_icon.png Binary files differnew file mode 100644 index 00000000000..3d4265bee8e --- /dev/null +++ b/doc/administration/monitoring/performance/img/grafana_save_icon.png diff --git a/doc/administration/monitoring/performance/img/metrics_gitlab_configuration_settings.png b/doc/administration/monitoring/performance/img/metrics_gitlab_configuration_settings.png Binary files differnew file mode 100644 index 00000000000..14d82b6ac98 --- /dev/null +++ b/doc/administration/monitoring/performance/img/metrics_gitlab_configuration_settings.png diff --git a/doc/administration/monitoring/performance/influxdb_configuration.md b/doc/administration/monitoring/performance/influxdb_configuration.md new file mode 100644 index 00000000000..c30cd2950d8 --- /dev/null +++ b/doc/administration/monitoring/performance/influxdb_configuration.md @@ -0,0 +1,193 @@ +# InfluxDB Configuration + +The default settings provided by [InfluxDB] are not sufficient for a high traffic +GitLab environment. The settings discussed in this document are based on the +settings GitLab uses for GitLab.com, depending on your own needs you may need to +further adjust them. + +If you are intending to run InfluxDB on the same server as GitLab, make sure +you have plenty of RAM since InfluxDB can use quite a bit depending on traffic. + +Unless you are going with a budget setup, it's advised to run it separately. + +## Requirements + +- InfluxDB 0.9.5 or newer +- A fairly modern version of Linux +- At least 4GB of RAM +- At least 10GB of storage for InfluxDB data + +Note that the RAM and storage requirements can differ greatly depending on the +amount of data received/stored. To limit the amount of stored data users can +look into [InfluxDB Retention Policies][influxdb-retention]. + +## Installation + +Installing InfluxDB is out of the scope of this document. Please refer to the +[InfluxDB documentation]. + +## InfluxDB Server Settings + +Since InfluxDB has many settings that users may wish to customize themselves +(e.g. what port to run InfluxDB on), we'll only cover the essentials. + +The configuration file in question is usually located at +`/etc/influxdb/influxdb.conf`. Whenever you make a change in this file, +InfluxDB needs to be restarted. + +### Storage Engine + +InfluxDB comes with different storage engines and as of InfluxDB 0.9.5 a new +storage engine is available, called [TSM Tree]. All users **must** use the new +`tsm1` storage engine as this [will be the default engine][tsm1-commit] in +upcoming InfluxDB releases. + +Make sure you have the following in your configuration file: + +``` +[data] + dir = "/var/lib/influxdb/data" + engine = "tsm1" +``` + +### Admin Panel + +Production environments should have the InfluxDB admin panel **disabled**. This +feature can be disabled by adding the following to your InfluxDB configuration +file: + +``` +[admin] + enabled = false +``` + +### HTTP + +HTTP is required when using the [InfluxDB CLI] or other tools such as Grafana, +thus it should be enabled. When enabling make sure to _also_ enable +authentication: + +``` +[http] + enabled = true + auth-enabled = true +``` + +_**Note:** Before you enable authentication, you might want to [create an +admin user](#create-a-new-admin-user)._ + +### UDP + +GitLab writes data to InfluxDB via UDP and thus this must be enabled. Enabling +UDP can be done using the following settings: + +``` +[[udp]] + enabled = true + bind-address = ":8089" + database = "gitlab" + batch-size = 1000 + batch-pending = 5 + batch-timeout = "1s" + read-buffer = 209715200 +``` + +This does the following: + +1. Enable UDP and bind it to port 8089 for all addresses. +2. Store any data received in the "gitlab" database. +3. Define a batch of points to be 1000 points in size and allow a maximum of + 5 batches _or_ flush them automatically after 1 second. +4. Define a UDP read buffer size of 200 MB. + +One of the most important settings here is the UDP read buffer size as if this +value is set too low, packets will be dropped. You must also make sure the OS +buffer size is set to the same value, the default value is almost never enough. + +To set the OS buffer size to 200 MB, on Linux you can run the following command: + +```bash +sysctl -w net.core.rmem_max=209715200 +``` + +To make this permanent, add the following to `/etc/sysctl.conf` and restart the +server: + +```bash +net.core.rmem_max=209715200 +``` + +It is **very important** to make sure the buffer sizes are large enough to +handle all data sent to InfluxDB as otherwise you _will_ lose data. The above +buffer sizes are based on the traffic for GitLab.com. Depending on the amount of +traffic, users may be able to use a smaller buffer size, but we highly recommend +using _at least_ 100 MB. + +When enabling UDP, users should take care to not expose the port to the public, +as doing so will allow anybody to write data into your InfluxDB database (as +[InfluxDB's UDP protocol][udp] doesn't support authentication). We recommend either +whitelisting the allowed IP addresses/ranges, or setting up a VLAN and only +allowing traffic from members of said VLAN. + +## Create a new admin user + +If you want to [enable authentication](#http), you might want to [create an +admin user][influx-admin]: + +``` +influx -execute "CREATE USER jeff WITH PASSWORD '1234' WITH ALL PRIVILEGES" +``` + +## Create the `gitlab` database + +Once you get InfluxDB up and running, you need to create a database for GitLab. +Make sure you have changed the [storage engine](#storage-engine) to `tsm1` +before creating a database. + +_**Note:** If you [created an admin user](#create-a-new-admin-user) and enabled +[HTTP authentication](#http), remember to append the username (`-username <username>`) +and password (`-password <password>`) you set earlier to the commands below._ + +Run the following command to create a database named `gitlab`: + +```bash +influx -execute 'CREATE DATABASE gitlab' +``` + +The name **must** be `gitlab`, do not use any other name. + +Next, make sure that the database was successfully created: + +```bash +influx -execute 'SHOW DATABASES' +``` + +The output should be similar to: + +``` +name: databases +--------------- +name +_internal +gitlab +``` + +That's it! Now your GitLab instance should send data to InfluxDB. + +--- + +Read more on: + +- [Introduction to GitLab Performance Monitoring](introduction.md) +- [GitLab Configuration](gitlab_configuration.md) +- [InfluxDB Schema](influxdb_schema.md) +- [Grafana Install/Configuration](grafana_configuration.md) + +[influxdb-retention]: https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management +[influxdb documentation]: https://docs.influxdata.com/influxdb/v0.9/ +[influxdb cli]: https://docs.influxdata.com/influxdb/v0.9/tools/shell/ +[udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/ +[influxdb]: https://influxdata.com/time-series-platform/influxdb/ +[tsm tree]: https://influxdata.com/blog/new-storage-engine-time-structured-merge-tree/ +[tsm1-commit]: https://github.com/influxdata/influxdb/commit/15d723dc77651bac83e09e2b1c94be480966cb0d +[influx-admin]: https://docs.influxdata.com/influxdb/v0.9/administration/authentication_and_authorization/#create-a-new-admin-user diff --git a/doc/administration/monitoring/performance/influxdb_schema.md b/doc/administration/monitoring/performance/influxdb_schema.md new file mode 100644 index 00000000000..41861860b6d --- /dev/null +++ b/doc/administration/monitoring/performance/influxdb_schema.md @@ -0,0 +1,88 @@ +# InfluxDB Schema + +The following measurements are currently stored in InfluxDB: + +- `PROCESS_file_descriptors` +- `PROCESS_gc_statistics` +- `PROCESS_memory_usage` +- `PROCESS_method_calls` +- `PROCESS_object_counts` +- `PROCESS_transactions` +- `PROCESS_views` + +Here, `PROCESS` is replaced with either `rails` or `sidekiq` depending on the +process type. In all series, any form of duration is stored in milliseconds. + +## PROCESS_file_descriptors + +This measurement contains the number of open file descriptors over time. The +value field `value` contains the number of descriptors. + +## PROCESS_gc_statistics + +This measurement contains Ruby garbage collection statistics such as the amount +of minor/major GC runs (relative to the last sampling interval), the time spent +in garbage collection cycles, and all fields/values returned by `GC.stat`. + +## PROCESS_memory_usage + +This measurement contains the process' memory usage (in bytes) over time. The +value field `value` contains the number of bytes. + +## PROCESS_method_calls + +This measurement contains the methods called during a transaction along with +their duration, and a name of the transaction action that invoked the method (if +available). The method call duration is stored in the value field `duration`, +while the method name is stored in the tag `method`. The tag `action` contains +the full name of the transaction action. Both the `method` and `action` fields +are in the following format: + +``` +ClassName#method_name +``` + +For example, a method called by the `show` method in the `UsersController` class +would have `action` set to `UsersController#show`. + +## PROCESS_object_counts + +This measurement is used to store retained Ruby objects (per class) and the +amount of retained objects. The number of objects is stored in the `count` value +field while the class name is stored in the `type` tag. + +## PROCESS_transactions + +This measurement is used to store basic transaction details such as the time it +took to complete a transaction, how much time was spent in SQL queries, etc. The +following value fields are available: + +| Value | Description | +| ----- | ----------- | +| `duration` | The total duration of the transaction | +| `allocated_memory` | The amount of bytes allocated while the transaction was running. This value is only reliable when using single-threaded application servers | +| `method_duration` | The total time spent in method calls | +| `sql_duration` | The total time spent in SQL queries | +| `view_duration` | The total time spent in views | + +## PROCESS_views + +This measurement is used to store view rendering timings for a transaction. The +following value fields are available: + +| Value | Description | +| ----- | ----------- | +| `duration` | The rendering time of the view | +| `view` | The path of the view, relative to the application's root directory | + +The `action` tag contains the action name of the transaction that rendered the +view. + +--- + +Read more on: + +- [Introduction to GitLab Performance Monitoring](introduction.md) +- [GitLab Configuration](gitlab_configuration.md) +- [InfluxDB Configuration](influxdb_configuration.md) +- [Grafana Install/Configuration](grafana_configuration.md) diff --git a/doc/administration/monitoring/performance/introduction.md b/doc/administration/monitoring/performance/introduction.md new file mode 100644 index 00000000000..79904916b7e --- /dev/null +++ b/doc/administration/monitoring/performance/introduction.md @@ -0,0 +1,65 @@ +# GitLab Performance Monitoring + +GitLab comes with its own application performance measuring system as of GitLab +8.4, simply called "GitLab Performance Monitoring". GitLab Performance Monitoring is available in both the +Community and Enterprise editions. + +Apart from this introduction, you are advised to read through the following +documents in order to understand and properly configure GitLab Performance Monitoring: + +- [GitLab Configuration](gitlab_configuration.md) +- [InfluxDB Install/Configuration](influxdb_configuration.md) +- [InfluxDB Schema](influxdb_schema.md) +- [Grafana Install/Configuration](grafana_configuration.md) + +## Introduction to GitLab Performance Monitoring + +GitLab Performance Monitoring makes it possible to measure a wide variety of statistics +including (but not limited to): + +- The time it took to complete a transaction (a web request or Sidekiq job). +- The time spent in running SQL queries and rendering HAML views. +- The time spent executing (instrumented) Ruby methods. +- Ruby object allocations, and retained objects in particular. +- System statistics such as the process' memory usage and open file descriptors. +- Ruby garbage collection statistics. + +Metrics data is written to [InfluxDB][influxdb] over [UDP][influxdb-udp]. Stored +data can be visualized using [Grafana][grafana] or any other application that +supports reading data from InfluxDB. Alternatively data can be queried using the +InfluxDB CLI. + +## Metric Types + +Two types of metrics are collected: + +1. Transaction specific metrics. +1. Sampled metrics, collected at a certain interval in a separate thread. + +### Transaction Metrics + +Transaction metrics are metrics that can be associated with a single +transaction. This includes statistics such as the transaction duration, timings +of any executed SQL queries, time spent rendering HAML views, etc. These metrics +are collected for every Rack request and Sidekiq job processed. + +### Sampled Metrics + +Sampled metrics are metrics that can't be associated with a single transaction. +Examples include garbage collection statistics and retained Ruby objects. These +metrics are collected at a regular interval. This interval is made up out of two +parts: + +1. A user defined interval. +1. A randomly generated offset added on top of the interval, the same offset + can't be used twice in a row. + +The actual interval can be anywhere between a half of the defined interval and a +half above the interval. For example, for a user defined interval of 15 seconds +the actual interval can be anywhere between 7.5 and 22.5. The interval is +re-generated for every sampling run instead of being generated once and re-used +for the duration of the process' lifetime. + +[influxdb]: https://influxdata.com/time-series-platform/influxdb/ +[influxdb-udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/ +[grafana]: http://grafana.org/ |