From d8f33c0a51d9106ece6cd4bae469e40734e05f85 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Sun, 25 Sep 2016 12:44:09 +0200 Subject: Move operations/ to new location [ci skip] --- doc/README.md | 2 +- doc/administration/operations.md | 6 + .../operations/cleaning_up_redis_sessions.md | 52 ++++++ .../operations/moving_repositories.md | 180 ++++++++++++++++++++ .../operations/sidekiq_memory_killer.md | 40 +++++ doc/administration/operations/unicorn.md | 86 ++++++++++ doc/operations/README.md | 6 +- doc/operations/cleaning_up_redis_sessions.md | 53 +----- doc/operations/moving_repositories.md | 181 +-------------------- doc/operations/sidekiq_memory_killer.md | 41 +---- doc/operations/unicorn.md | 87 +--------- 11 files changed, 370 insertions(+), 364 deletions(-) create mode 100644 doc/administration/operations.md create mode 100644 doc/administration/operations/cleaning_up_redis_sessions.md create mode 100644 doc/administration/operations/moving_repositories.md create mode 100644 doc/administration/operations/sidekiq_memory_killer.md create mode 100644 doc/administration/operations/unicorn.md (limited to 'doc') diff --git a/doc/README.md b/doc/README.md index dd0eb97489e..ed2d09bedec 100644 --- a/doc/README.md +++ b/doc/README.md @@ -34,7 +34,7 @@ - [Libravatar](customization/libravatar.md) Use Libravatar instead of Gravatar for user avatars. - [Log system](administration/logs.md) Log system. - [Environment Variables](administration/environment_variables.md) to configure GitLab. -- [Operations](operations/README.md) Keeping GitLab up and running. +- [Operations](administration/operations.md) Keeping GitLab up and running. - [Raketasks](raketasks/README.md) Backups, maintenance, automatic webhook setup and the importing of projects. - [Repository checks](administration/repository_checks.md) Periodic Git repository checks. - [Repository storages](administration/repository_storages.md) Manage the paths used to store repositories. diff --git a/doc/administration/operations.md b/doc/administration/operations.md new file mode 100644 index 00000000000..4b582d16b64 --- /dev/null +++ b/doc/administration/operations.md @@ -0,0 +1,6 @@ +# GitLab operations + +- [Sidekiq MemoryKiller](operations/sidekiq_memory_killer.md) +- [Cleaning up Redis sessions](operations/cleaning_up_redis_sessions.md) +- [Understanding Unicorn and unicorn-worker-killer](operations/unicorn.md) +- [Moving repositories to a new location](operations/moving_repositories.md) diff --git a/doc/administration/operations/cleaning_up_redis_sessions.md b/doc/administration/operations/cleaning_up_redis_sessions.md new file mode 100644 index 00000000000..93521e976d5 --- /dev/null +++ b/doc/administration/operations/cleaning_up_redis_sessions.md @@ -0,0 +1,52 @@ +# Cleaning up stale Redis sessions + +Since version 6.2, GitLab stores web user sessions as key-value pairs in Redis. +Prior to GitLab 7.3, user sessions did not automatically expire from Redis. If +you have been running a large GitLab server (thousands of users) since before +GitLab 7.3 we recommend cleaning up stale sessions to compact the Redis +database after you upgrade to GitLab 7.3. You can also perform a cleanup while +still running GitLab 7.2 or older, but in that case new stale sessions will +start building up again after you clean up. + +In GitLab versions prior to 7.3.0, the session keys in Redis are 16-byte +hexadecimal values such as '976aa289e2189b17d7ef525a6702ace9'. Starting with +GitLab 7.3.0, the keys are +prefixed with 'session:gitlab:', so they would look like +'session:gitlab:976aa289e2189b17d7ef525a6702ace9'. Below we describe how to +remove the keys in the old format. + +First we define a shell function with the proper Redis connection details. + +``` +rcli() { + # This example works for Omnibus installations of GitLab 7.3 or newer. For an + # installation from source you will have to change the socket path and the + # path to redis-cli. + sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket "$@" +} + +# test the new shell function; the response should be PONG +rcli ping +``` + +Now we do a search to see if there are any session keys in the old format for +us to clean up. + +``` +# returns the number of old-format session keys in Redis +rcli keys '*' | grep '^[a-f0-9]\{32\}$' | wc -l +``` + +If the number is larger than zero, you can proceed to expire the keys from +Redis. If the number is zero there is nothing to clean up. + +``` +# Tell Redis to expire each matched key after 600 seconds. +rcli keys '*' | grep '^[a-f0-9]\{32\}$' | awk '{ print "expire", $0, 600 }' | rcli +# This will print '(integer) 1' for each key that gets expired. +``` + +Over the next 15 minutes (10 minutes expiry time plus 5 minutes Redis +background save interval) your Redis database will be compacted. If you are +still using GitLab 7.2, users who are not clicking around in GitLab during the +10 minute expiry window will be signed out of GitLab. diff --git a/doc/administration/operations/moving_repositories.md b/doc/administration/operations/moving_repositories.md new file mode 100644 index 00000000000..54adb99386a --- /dev/null +++ b/doc/administration/operations/moving_repositories.md @@ -0,0 +1,180 @@ +# Moving repositories managed by GitLab + +Sometimes you need to move all repositories managed by GitLab to +another filesystem or another server. In this document we will look +at some of the ways you can copy all your repositories from +`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`. + +We will look at three scenarios: the target directory is empty, the +target directory contains an outdated copy of the repositories, and +how to deal with thousands of repositories. + +**Each of the approaches we list can/will overwrite data in the +target directory `/mnt/gitlab/repositories`. Do not mix up the +source and the target.** + +## Target directory is empty: use a tar pipe + +If the target directory `/mnt/gitlab/repositories` is empty the +simplest thing to do is to use a tar pipe. This method has low +overhead and tar is almost always already installed on your system. +However, it is not possible to resume an interrupted tar pipe: if +that happens then all data must be copied again. + +``` +# As the git user +tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\ + tar -C /mnt/gitlab/repositories -xf - +``` + +If you want to see progress, replace `-xf` with `-xvf`. + +### Tar pipe to another server + +You can also use a tar pipe to copy data to another server. If your +'git' user has SSH access to the newserver as 'git@newserver', you +can pipe the data through SSH. + +``` +# As the git user +tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\ + ssh git@newserver tar -C /mnt/gitlab/repositories -xf - +``` + +If you want to compress the data before it goes over the network +(which will cost you CPU cycles) you can replace `ssh` with `ssh -C`. + +## The target directory contains an outdated copy of the repositories: use rsync + +If the target directory already contains a partial / outdated copy +of the repositories it may be wasteful to copy all the data again +with tar. In this scenario it is better to use rsync. This utility +is either already installed on your system or easily installable +via apt, yum etc. + +``` +# As the 'git' user +rsync -a --delete /var/opt/gitlab/git-data/repositories/. \ + /mnt/gitlab/repositories +``` + +The `/.` in the command above is very important, without it you can +easily get the wrong directory structure in the target directory. +If you want to see progress, replace `-a` with `-av`. + +### Single rsync to another server + +If the 'git' user on your source system has SSH access to the target +server you can send the repositories over the network with rsync. + +``` +# As the 'git' user +rsync -a --delete /var/opt/gitlab/git-data/repositories/. \ + git@newserver:/mnt/gitlab/repositories +``` + +## Thousands of Git repositories: use one rsync per repository + +Every time you start an rsync job it has to inspect all files in +the source directory, all files in the target directory, and then +decide what files to copy or not. If the source or target directory +has many contents this startup phase of rsync can become a burden +for your GitLab server. In cases like this you can make rsync's +life easier by dividing its work in smaller pieces, and sync one +repository at a time. + +In addition to rsync we will use [GNU +Parallel](http://www.gnu.org/software/parallel/). This utility is +not included in GitLab so you need to install it yourself with apt +or yum. Also note that the GitLab scripts we used below were added +in GitLab 8.1. + +** This process does not clean up repositories at the target location that no +longer exist at the source. ** If you start using your GitLab instance with +`/mnt/gitlab/repositories`, you need to run `gitlab-rake gitlab:cleanup:repos` +after switching to the new repository storage directory. + +### Parallel rsync for all repositories known to GitLab + +This will sync repositories with 10 rsync processes at a time. We keep +track of progress so that the transfer can be restarted if necessary. + +First we create a new directory, owned by 'git', to hold transfer +logs. We assume the directory is empty before we start the transfer +procedure, and that we are the only ones writing files in it. + +``` +# Omnibus +sudo mkdir /var/opt/gitlab/transfer-logs +sudo chown git:git /var/opt/gitlab/transfer-logs + +# Source +sudo -u git -H mkdir /home/git/transfer-logs +``` + +We seed the process with a list of the directories we want to copy. + +``` +# Omnibus +sudo -u git sh -c 'gitlab-rake gitlab:list_repos > /var/opt/gitlab/transfer-logs/all-repos-$(date +%s).txt' + +# Source +cd /home/git/gitlab +sudo -u git -H sh -c 'bundle exec rake gitlab:list_repos > /home/git/transfer-logs/all-repos-$(date +%s).txt' +``` + +Now we can start the transfer. The command below is idempotent, and +the number of jobs done by GNU Parallel should converge to zero. If it +does not some repositories listed in all-repos-1234.txt may have been +deleted/renamed before they could be copied. + +``` +# Omnibus +sudo -u git sh -c ' +cat /var/opt/gitlab/transfer-logs/* | sort | uniq -u |\ + /usr/bin/env JOBS=10 \ + /opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \ + /var/opt/gitlab/transfer-logs/success-$(date +%s).log \ + /var/opt/gitlab/git-data/repositories \ + /mnt/gitlab/repositories +' + +# Source +cd /home/git/gitlab +sudo -u git -H sh -c ' +cat /home/git/transfer-logs/* | sort | uniq -u |\ + /usr/bin/env JOBS=10 \ + bin/parallel-rsync-repos \ + /home/git/transfer-logs/success-$(date +%s).log \ + /home/git/repositories \ + /mnt/gitlab/repositories +` +``` + +### Parallel rsync only for repositories with recent activity + +Suppose you have already done one sync that started after 2015-10-1 12:00 UTC. +Then you might only want to sync repositories that were changed via GitLab +_after_ that time. You can use the 'SINCE' variable to tell 'rake +gitlab:list_repos' to only print repositories with recent activity. + +``` +# Omnibus +sudo gitlab-rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\ + sudo -u git \ + /usr/bin/env JOBS=10 \ + /opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \ + success-$(date +%s).log \ + /var/opt/gitlab/git-data/repositories \ + /mnt/gitlab/repositories + +# Source +cd /home/git/gitlab +sudo -u git -H bundle exec rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\ + sudo -u git -H \ + /usr/bin/env JOBS=10 \ + bin/parallel-rsync-repos \ + success-$(date +%s).log \ + /home/git/repositories \ + /mnt/gitlab/repositories +``` diff --git a/doc/administration/operations/sidekiq_memory_killer.md b/doc/administration/operations/sidekiq_memory_killer.md new file mode 100644 index 00000000000..b5e78348989 --- /dev/null +++ b/doc/administration/operations/sidekiq_memory_killer.md @@ -0,0 +1,40 @@ +# Sidekiq MemoryKiller + +The GitLab Rails application code suffers from memory leaks. For web requests +this problem is made manageable using +[unicorn-worker-killer](https://github.com/kzk/unicorn-worker-killer) which +restarts Unicorn worker processes in between requests when needed. The Sidekiq +MemoryKiller applies the same approach to the Sidekiq processes used by GitLab +to process background jobs. + +Unlike unicorn-worker-killer, which is enabled by default for all GitLab +installations since GitLab 6.4, the Sidekiq MemoryKiller is enabled by default +_only_ for Omnibus packages. The reason for this is that the MemoryKiller +relies on Runit to restart Sidekiq after a memory-induced shutdown and GitLab +installations from source do not all use Runit or an equivalent. + +With the default settings, the MemoryKiller will cause a Sidekiq restart no +more often than once every 15 minutes, with the restart causing about one +minute of delay for incoming background jobs. + +## Configuring the MemoryKiller + +The MemoryKiller is controlled using environment variables. + +- `SIDEKIQ_MEMORY_KILLER_MAX_RSS`: if this variable is set, and its value is + greater than 0, then after each Sidekiq job, the MemoryKiller will check the + RSS of the Sidekiq process that executed the job. If the RSS of the Sidekiq + process (expressed in kilobytes) exceeds SIDEKIQ_MEMORY_KILLER_MAX_RSS, a + delayed shutdown is triggered. The default value for Omnibus packages is set + [in the omnibus-gitlab + repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb). +- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults 900 seconds (15 minutes). When + a shutdown is triggered, the Sidekiq process will keep working normally for + another 15 minutes. +- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. When the grace + time has expired, the MemoryKiller tells Sidekiq to stop accepting new jobs. + Existing jobs get 30 seconds to finish. After that, the MemoryKiller tells + Sidekiq to shut down, and an external supervision mechanism (e.g. Runit) must + restart Sidekiq. +- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_SIGNAL`: defaults to `SIGKILL`. The name of + the final signal sent to the Sidekiq process when we want it to shut down. diff --git a/doc/administration/operations/unicorn.md b/doc/administration/operations/unicorn.md new file mode 100644 index 00000000000..bad61151bda --- /dev/null +++ b/doc/administration/operations/unicorn.md @@ -0,0 +1,86 @@ +# Understanding Unicorn and unicorn-worker-killer + +## Unicorn + +GitLab uses [Unicorn](http://unicorn.bogomips.org/), a pre-forking Ruby web +server, to handle web requests (web browsers and Git HTTP clients). Unicorn is +a daemon written in Ruby and C that can load and run a Ruby on Rails +application; in our case the Rails application is GitLab Community Edition or +GitLab Enterprise Edition. + +Unicorn has a multi-process architecture to make better use of available CPU +cores (processes can run on different cores) and to have stronger fault +tolerance (most failures stay isolated in only one process and cannot take down +GitLab entirely). On startup, the Unicorn 'master' process loads a clean Ruby +environment with the GitLab application code, and then spawns 'workers' which +inherit this clean initial environment. The 'master' never handles any +requests, that is left to the workers. The operating system network stack +queues incoming requests and distributes them among the workers. + +In a perfect world, the master would spawn its pool of workers once, and then +the workers handle incoming web requests one after another until the end of +time. In reality, worker processes can crash or time out: if the master notices +that a worker takes too long to handle a request it will terminate the worker +process with SIGKILL ('kill -9'). No matter how the worker process ended, the +master process will replace it with a new 'clean' process again. Unicorn is +designed to be able to replace 'crashed' workers without dropping user +requests. + +This is what a Unicorn worker timeout looks like in `unicorn_stderr.log`. The +master process has PID 56227 below. + +``` +[2015-06-05T10:58:08.660325 #56227] ERROR -- : worker=10 PID:53009 timeout (61s > 60s), killing +[2015-06-05T10:58:08.699360 #56227] ERROR -- : reaped # worker=10 +[2015-06-05T10:58:08.708141 #62538] INFO -- : worker=10 spawned pid=62538 +[2015-06-05T10:58:08.708824 #62538] INFO -- : worker=10 ready +``` + +### Tunables + +The main tunables for Unicorn are the number of worker processes and the +request timeout after which the Unicorn master terminates a worker process. +See the [omnibus-gitlab Unicorn settings +documentation](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/doc/settings/unicorn.md) +if you want to adjust these settings. + +## unicorn-worker-killer + +GitLab has memory leaks. These memory leaks manifest themselves in long-running +processes, such as Unicorn workers. (The Unicorn master process is not known to +leak memory, probably because it does not handle user requests.) + +To make these memory leaks manageable, GitLab comes with the +[unicorn-worker-killer gem](https://github.com/kzk/unicorn-worker-killer). This +gem [monkey-patches](https://en.wikipedia.org/wiki/Monkey_patch) the Unicorn +workers to do a memory self-check after every 16 requests. If the memory of the +Unicorn worker exceeds a pre-set limit then the worker process exits. The +Unicorn master then automatically replaces the worker process. + +This is a robust way to handle memory leaks: Unicorn is designed to handle +workers that 'crash' so no user requests will be dropped. The +unicorn-worker-killer gem is designed to only terminate a worker process _in +between requests_, so no user requests are affected. + +This is what a Unicorn worker memory restart looks like in unicorn_stderr.log. +You see that worker 4 (PID 125918) is inspecting itself and decides to exit. +The threshold memory value was 254802235 bytes, about 250MB. With GitLab this +threshold is a random value between 200 and 250 MB. The master process (PID +117565) then reaps the worker process and spawns a new 'worker 4' with PID +127549. + +``` +[2015-06-05T12:07:41.828374 #125918] WARN -- : #: worker (pid: 125918) exceeds memory limit (256413696 bytes > 254802235 bytes) +[2015-06-05T12:07:41.828472 #125918] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 125918) alive: 23 sec (trial 1) +[2015-06-05T12:07:42.025916 #117565] INFO -- : reaped # worker=4 +[2015-06-05T12:07:42.034527 #127549] INFO -- : worker=4 spawned pid=127549 +[2015-06-05T12:07:42.035217 #127549] INFO -- : worker=4 ready +``` + +One other thing that stands out in the log snippet above, taken from +GitLab.com, is that 'worker 4' was serving requests for only 23 seconds. This +is a normal value for our current GitLab.com setup and traffic. + +The high frequency of Unicorn memory restarts on some GitLab sites can be a +source of confusion for administrators. Usually they are a [red +herring](https://en.wikipedia.org/wiki/Red_herring). diff --git a/doc/operations/README.md b/doc/operations/README.md index 6a35dab7b6c..58f16aff7bd 100644 --- a/doc/operations/README.md +++ b/doc/operations/README.md @@ -1,5 +1 @@ -# GitLab operations - -- [Sidekiq MemoryKiller](sidekiq_memory_killer.md) -- [Cleaning up Redis sessions](cleaning_up_redis_sessions.md) -- [Understanding Unicorn and unicorn-worker-killer](unicorn.md) +This document was moved to [administration/operations](../administration/operations.md). diff --git a/doc/operations/cleaning_up_redis_sessions.md b/doc/operations/cleaning_up_redis_sessions.md index 93521e976d5..2a1d0a8c8eb 100644 --- a/doc/operations/cleaning_up_redis_sessions.md +++ b/doc/operations/cleaning_up_redis_sessions.md @@ -1,52 +1 @@ -# Cleaning up stale Redis sessions - -Since version 6.2, GitLab stores web user sessions as key-value pairs in Redis. -Prior to GitLab 7.3, user sessions did not automatically expire from Redis. If -you have been running a large GitLab server (thousands of users) since before -GitLab 7.3 we recommend cleaning up stale sessions to compact the Redis -database after you upgrade to GitLab 7.3. You can also perform a cleanup while -still running GitLab 7.2 or older, but in that case new stale sessions will -start building up again after you clean up. - -In GitLab versions prior to 7.3.0, the session keys in Redis are 16-byte -hexadecimal values such as '976aa289e2189b17d7ef525a6702ace9'. Starting with -GitLab 7.3.0, the keys are -prefixed with 'session:gitlab:', so they would look like -'session:gitlab:976aa289e2189b17d7ef525a6702ace9'. Below we describe how to -remove the keys in the old format. - -First we define a shell function with the proper Redis connection details. - -``` -rcli() { - # This example works for Omnibus installations of GitLab 7.3 or newer. For an - # installation from source you will have to change the socket path and the - # path to redis-cli. - sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket "$@" -} - -# test the new shell function; the response should be PONG -rcli ping -``` - -Now we do a search to see if there are any session keys in the old format for -us to clean up. - -``` -# returns the number of old-format session keys in Redis -rcli keys '*' | grep '^[a-f0-9]\{32\}$' | wc -l -``` - -If the number is larger than zero, you can proceed to expire the keys from -Redis. If the number is zero there is nothing to clean up. - -``` -# Tell Redis to expire each matched key after 600 seconds. -rcli keys '*' | grep '^[a-f0-9]\{32\}$' | awk '{ print "expire", $0, 600 }' | rcli -# This will print '(integer) 1' for each key that gets expired. -``` - -Over the next 15 minutes (10 minutes expiry time plus 5 minutes Redis -background save interval) your Redis database will be compacted. If you are -still using GitLab 7.2, users who are not clicking around in GitLab during the -10 minute expiry window will be signed out of GitLab. +This document was moved to [administration/operations/cleaning_up_redis_sessions](../administration/operations/cleaning_up_redis_sessions.md). diff --git a/doc/operations/moving_repositories.md b/doc/operations/moving_repositories.md index 54adb99386a..c54bca324a5 100644 --- a/doc/operations/moving_repositories.md +++ b/doc/operations/moving_repositories.md @@ -1,180 +1 @@ -# Moving repositories managed by GitLab - -Sometimes you need to move all repositories managed by GitLab to -another filesystem or another server. In this document we will look -at some of the ways you can copy all your repositories from -`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`. - -We will look at three scenarios: the target directory is empty, the -target directory contains an outdated copy of the repositories, and -how to deal with thousands of repositories. - -**Each of the approaches we list can/will overwrite data in the -target directory `/mnt/gitlab/repositories`. Do not mix up the -source and the target.** - -## Target directory is empty: use a tar pipe - -If the target directory `/mnt/gitlab/repositories` is empty the -simplest thing to do is to use a tar pipe. This method has low -overhead and tar is almost always already installed on your system. -However, it is not possible to resume an interrupted tar pipe: if -that happens then all data must be copied again. - -``` -# As the git user -tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\ - tar -C /mnt/gitlab/repositories -xf - -``` - -If you want to see progress, replace `-xf` with `-xvf`. - -### Tar pipe to another server - -You can also use a tar pipe to copy data to another server. If your -'git' user has SSH access to the newserver as 'git@newserver', you -can pipe the data through SSH. - -``` -# As the git user -tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\ - ssh git@newserver tar -C /mnt/gitlab/repositories -xf - -``` - -If you want to compress the data before it goes over the network -(which will cost you CPU cycles) you can replace `ssh` with `ssh -C`. - -## The target directory contains an outdated copy of the repositories: use rsync - -If the target directory already contains a partial / outdated copy -of the repositories it may be wasteful to copy all the data again -with tar. In this scenario it is better to use rsync. This utility -is either already installed on your system or easily installable -via apt, yum etc. - -``` -# As the 'git' user -rsync -a --delete /var/opt/gitlab/git-data/repositories/. \ - /mnt/gitlab/repositories -``` - -The `/.` in the command above is very important, without it you can -easily get the wrong directory structure in the target directory. -If you want to see progress, replace `-a` with `-av`. - -### Single rsync to another server - -If the 'git' user on your source system has SSH access to the target -server you can send the repositories over the network with rsync. - -``` -# As the 'git' user -rsync -a --delete /var/opt/gitlab/git-data/repositories/. \ - git@newserver:/mnt/gitlab/repositories -``` - -## Thousands of Git repositories: use one rsync per repository - -Every time you start an rsync job it has to inspect all files in -the source directory, all files in the target directory, and then -decide what files to copy or not. If the source or target directory -has many contents this startup phase of rsync can become a burden -for your GitLab server. In cases like this you can make rsync's -life easier by dividing its work in smaller pieces, and sync one -repository at a time. - -In addition to rsync we will use [GNU -Parallel](http://www.gnu.org/software/parallel/). This utility is -not included in GitLab so you need to install it yourself with apt -or yum. Also note that the GitLab scripts we used below were added -in GitLab 8.1. - -** This process does not clean up repositories at the target location that no -longer exist at the source. ** If you start using your GitLab instance with -`/mnt/gitlab/repositories`, you need to run `gitlab-rake gitlab:cleanup:repos` -after switching to the new repository storage directory. - -### Parallel rsync for all repositories known to GitLab - -This will sync repositories with 10 rsync processes at a time. We keep -track of progress so that the transfer can be restarted if necessary. - -First we create a new directory, owned by 'git', to hold transfer -logs. We assume the directory is empty before we start the transfer -procedure, and that we are the only ones writing files in it. - -``` -# Omnibus -sudo mkdir /var/opt/gitlab/transfer-logs -sudo chown git:git /var/opt/gitlab/transfer-logs - -# Source -sudo -u git -H mkdir /home/git/transfer-logs -``` - -We seed the process with a list of the directories we want to copy. - -``` -# Omnibus -sudo -u git sh -c 'gitlab-rake gitlab:list_repos > /var/opt/gitlab/transfer-logs/all-repos-$(date +%s).txt' - -# Source -cd /home/git/gitlab -sudo -u git -H sh -c 'bundle exec rake gitlab:list_repos > /home/git/transfer-logs/all-repos-$(date +%s).txt' -``` - -Now we can start the transfer. The command below is idempotent, and -the number of jobs done by GNU Parallel should converge to zero. If it -does not some repositories listed in all-repos-1234.txt may have been -deleted/renamed before they could be copied. - -``` -# Omnibus -sudo -u git sh -c ' -cat /var/opt/gitlab/transfer-logs/* | sort | uniq -u |\ - /usr/bin/env JOBS=10 \ - /opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \ - /var/opt/gitlab/transfer-logs/success-$(date +%s).log \ - /var/opt/gitlab/git-data/repositories \ - /mnt/gitlab/repositories -' - -# Source -cd /home/git/gitlab -sudo -u git -H sh -c ' -cat /home/git/transfer-logs/* | sort | uniq -u |\ - /usr/bin/env JOBS=10 \ - bin/parallel-rsync-repos \ - /home/git/transfer-logs/success-$(date +%s).log \ - /home/git/repositories \ - /mnt/gitlab/repositories -` -``` - -### Parallel rsync only for repositories with recent activity - -Suppose you have already done one sync that started after 2015-10-1 12:00 UTC. -Then you might only want to sync repositories that were changed via GitLab -_after_ that time. You can use the 'SINCE' variable to tell 'rake -gitlab:list_repos' to only print repositories with recent activity. - -``` -# Omnibus -sudo gitlab-rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\ - sudo -u git \ - /usr/bin/env JOBS=10 \ - /opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \ - success-$(date +%s).log \ - /var/opt/gitlab/git-data/repositories \ - /mnt/gitlab/repositories - -# Source -cd /home/git/gitlab -sudo -u git -H bundle exec rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\ - sudo -u git -H \ - /usr/bin/env JOBS=10 \ - bin/parallel-rsync-repos \ - success-$(date +%s).log \ - /home/git/repositories \ - /mnt/gitlab/repositories -``` +This document was moved to [administration/operations/moving_repositories](../administration/operations/moving_repositories.md). diff --git a/doc/operations/sidekiq_memory_killer.md b/doc/operations/sidekiq_memory_killer.md index b5e78348989..cf7c3b2e2ed 100644 --- a/doc/operations/sidekiq_memory_killer.md +++ b/doc/operations/sidekiq_memory_killer.md @@ -1,40 +1 @@ -# Sidekiq MemoryKiller - -The GitLab Rails application code suffers from memory leaks. For web requests -this problem is made manageable using -[unicorn-worker-killer](https://github.com/kzk/unicorn-worker-killer) which -restarts Unicorn worker processes in between requests when needed. The Sidekiq -MemoryKiller applies the same approach to the Sidekiq processes used by GitLab -to process background jobs. - -Unlike unicorn-worker-killer, which is enabled by default for all GitLab -installations since GitLab 6.4, the Sidekiq MemoryKiller is enabled by default -_only_ for Omnibus packages. The reason for this is that the MemoryKiller -relies on Runit to restart Sidekiq after a memory-induced shutdown and GitLab -installations from source do not all use Runit or an equivalent. - -With the default settings, the MemoryKiller will cause a Sidekiq restart no -more often than once every 15 minutes, with the restart causing about one -minute of delay for incoming background jobs. - -## Configuring the MemoryKiller - -The MemoryKiller is controlled using environment variables. - -- `SIDEKIQ_MEMORY_KILLER_MAX_RSS`: if this variable is set, and its value is - greater than 0, then after each Sidekiq job, the MemoryKiller will check the - RSS of the Sidekiq process that executed the job. If the RSS of the Sidekiq - process (expressed in kilobytes) exceeds SIDEKIQ_MEMORY_KILLER_MAX_RSS, a - delayed shutdown is triggered. The default value for Omnibus packages is set - [in the omnibus-gitlab - repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb). -- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults 900 seconds (15 minutes). When - a shutdown is triggered, the Sidekiq process will keep working normally for - another 15 minutes. -- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. When the grace - time has expired, the MemoryKiller tells Sidekiq to stop accepting new jobs. - Existing jobs get 30 seconds to finish. After that, the MemoryKiller tells - Sidekiq to shut down, and an external supervision mechanism (e.g. Runit) must - restart Sidekiq. -- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_SIGNAL`: defaults to `SIGKILL`. The name of - the final signal sent to the Sidekiq process when we want it to shut down. +This document was moved to [administration/operations/sidekiq_memory_killer](../administration/operations/sidekiq_memory_killer.md). diff --git a/doc/operations/unicorn.md b/doc/operations/unicorn.md index bad61151bda..fbc9697b755 100644 --- a/doc/operations/unicorn.md +++ b/doc/operations/unicorn.md @@ -1,86 +1 @@ -# Understanding Unicorn and unicorn-worker-killer - -## Unicorn - -GitLab uses [Unicorn](http://unicorn.bogomips.org/), a pre-forking Ruby web -server, to handle web requests (web browsers and Git HTTP clients). Unicorn is -a daemon written in Ruby and C that can load and run a Ruby on Rails -application; in our case the Rails application is GitLab Community Edition or -GitLab Enterprise Edition. - -Unicorn has a multi-process architecture to make better use of available CPU -cores (processes can run on different cores) and to have stronger fault -tolerance (most failures stay isolated in only one process and cannot take down -GitLab entirely). On startup, the Unicorn 'master' process loads a clean Ruby -environment with the GitLab application code, and then spawns 'workers' which -inherit this clean initial environment. The 'master' never handles any -requests, that is left to the workers. The operating system network stack -queues incoming requests and distributes them among the workers. - -In a perfect world, the master would spawn its pool of workers once, and then -the workers handle incoming web requests one after another until the end of -time. In reality, worker processes can crash or time out: if the master notices -that a worker takes too long to handle a request it will terminate the worker -process with SIGKILL ('kill -9'). No matter how the worker process ended, the -master process will replace it with a new 'clean' process again. Unicorn is -designed to be able to replace 'crashed' workers without dropping user -requests. - -This is what a Unicorn worker timeout looks like in `unicorn_stderr.log`. The -master process has PID 56227 below. - -``` -[2015-06-05T10:58:08.660325 #56227] ERROR -- : worker=10 PID:53009 timeout (61s > 60s), killing -[2015-06-05T10:58:08.699360 #56227] ERROR -- : reaped # worker=10 -[2015-06-05T10:58:08.708141 #62538] INFO -- : worker=10 spawned pid=62538 -[2015-06-05T10:58:08.708824 #62538] INFO -- : worker=10 ready -``` - -### Tunables - -The main tunables for Unicorn are the number of worker processes and the -request timeout after which the Unicorn master terminates a worker process. -See the [omnibus-gitlab Unicorn settings -documentation](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/doc/settings/unicorn.md) -if you want to adjust these settings. - -## unicorn-worker-killer - -GitLab has memory leaks. These memory leaks manifest themselves in long-running -processes, such as Unicorn workers. (The Unicorn master process is not known to -leak memory, probably because it does not handle user requests.) - -To make these memory leaks manageable, GitLab comes with the -[unicorn-worker-killer gem](https://github.com/kzk/unicorn-worker-killer). This -gem [monkey-patches](https://en.wikipedia.org/wiki/Monkey_patch) the Unicorn -workers to do a memory self-check after every 16 requests. If the memory of the -Unicorn worker exceeds a pre-set limit then the worker process exits. The -Unicorn master then automatically replaces the worker process. - -This is a robust way to handle memory leaks: Unicorn is designed to handle -workers that 'crash' so no user requests will be dropped. The -unicorn-worker-killer gem is designed to only terminate a worker process _in -between requests_, so no user requests are affected. - -This is what a Unicorn worker memory restart looks like in unicorn_stderr.log. -You see that worker 4 (PID 125918) is inspecting itself and decides to exit. -The threshold memory value was 254802235 bytes, about 250MB. With GitLab this -threshold is a random value between 200 and 250 MB. The master process (PID -117565) then reaps the worker process and spawns a new 'worker 4' with PID -127549. - -``` -[2015-06-05T12:07:41.828374 #125918] WARN -- : #: worker (pid: 125918) exceeds memory limit (256413696 bytes > 254802235 bytes) -[2015-06-05T12:07:41.828472 #125918] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 125918) alive: 23 sec (trial 1) -[2015-06-05T12:07:42.025916 #117565] INFO -- : reaped # worker=4 -[2015-06-05T12:07:42.034527 #127549] INFO -- : worker=4 spawned pid=127549 -[2015-06-05T12:07:42.035217 #127549] INFO -- : worker=4 ready -``` - -One other thing that stands out in the log snippet above, taken from -GitLab.com, is that 'worker 4' was serving requests for only 23 seconds. This -is a normal value for our current GitLab.com setup and traffic. - -The high frequency of Unicorn memory restarts on some GitLab sites can be a -source of confusion for administrators. Usually they are a [red -herring](https://en.wikipedia.org/wiki/Red_herring). +This document was moved to [administration/operations/unicorn](../administration/operations/unicorn.md). -- cgit v1.2.1