summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJacob Vosmaer <contact@jacobvosmaer.nl>2015-09-25 18:31:54 +0200
committerJacob Vosmaer <contact@jacobvosmaer.nl>2015-09-25 18:32:02 +0200
commit5bcd0efe3e0b1fef06147d87f843adac717d7c42 (patch)
tree17397dba894df43d599cfb6a5d073446d3fd4090
parent7a8a892efdf59925a95cdf6504f7c74c31b87eeb (diff)
downloadgitlab-ce-5bcd0efe3e0b1fef06147d87f843adac717d7c42.tar.gz
Add parallel-rsync-repos script and start docs
-rw-r--r--bin/parallel-rsync-repos26
-rw-r--r--doc/operations/rsyncing_repositories.md87
2 files changed, 113 insertions, 0 deletions
diff --git a/bin/parallel-rsync-repos b/bin/parallel-rsync-repos
new file mode 100644
index 00000000000..b2429f743b5
--- /dev/null
+++ b/bin/parallel-rsync-repos
@@ -0,0 +1,26 @@
+#!/bin/sh
+# this script should run as the 'git' user, not root, because of mkdir
+#
+# Example invocation:
+# find /var/opt/gitlab/git-data/repositories -maxdepth 2 | \
+# parallel-rsync-repos /var/opt/gitlab/git-data/repositories /mnt/gitlab/repositories
+
+SRC=$1
+DEST=$2
+
+if [ -z "$JOBS" ] ; then
+ JOBS=10
+fi
+
+if [ -z "$SRC" ] || [ -z "$DEST" ] ; then
+ echo "Usage: $0 SRC DEST"
+ exit 1
+fi
+
+if ! cd $SRC ; then
+ echo "cd $SRC failed"
+ exit 1
+fi
+
+sed "s|$SRC|./|" |\
+ parallel -j$JOBS --progress "mkdir -p $DEST/{} && rsync --delete -a {}/. $DEST/{}/"
diff --git a/doc/operations/rsyncing_repositories.md b/doc/operations/rsyncing_repositories.md
new file mode 100644
index 00000000000..231e09f0462
--- /dev/null
+++ b/doc/operations/rsyncing_repositories.md
@@ -0,0 +1,87 @@
+# Moving repositories managed by GitLab
+
+Sometimes you need to move all repositories managed by GitLab to
+another filesystem or another server. In this document we will look
+at some of the ways you can copy all your repositories from
+`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`.
+
+We will look at three scenarios: the target directory is empty, the
+target directory contains an outdated copy of the repositories, and
+how to deal with thousands of repositories.
+
+**Each of the approaches we list can/will overwrite data in the
+target directory `/mnt/gitlab/repositories`. Do not mix up the
+source and the target.**
+
+## Target directory is empty: use a tar pipe
+
+If the target directory `/mnt/gitlab/repositories` is empty the
+simplest thing to do is to use a tar pipe.
+
+```
+# As the git user
+tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
+ tar -C /mnt/gitlab/repositories -xf -
+```
+
+If you want to see progress, replace `-xf` with `-xvf`.
+
+### Tar pipe to another server
+
+You can also use a tar pipe to copy data to another server. If your
+'git' user has SSH access to the newserver as 'git@newserver', you
+can pipe the data through SSH.
+
+```
+# As the git user
+tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
+ ssh git@newserver tar -C /mnt/gitlab/repositories -xf -
+```
+
+If you want to compress the data before it goes over the network
+(which will cost you CPU cycles) you can replace `ssh` with `ssh
+-C`.
+
+## The target directory contains an outdated copy of the repositories: use rsync
+
+In this scenario it is better to use rsync. This utility is either
+already installed on your system or easily installable via apt, yum
+etc.
+
+```
+# As the 'git' user
+rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
+ /mnt/gitlab/repositories
+```
+
+The `/.` in the command above is very important, without it you can
+easily get the wrong directory structure in the target directory.
+If you want to see progress, replace `-a` with `-av`.
+
+### Single rsync to another server
+
+If the 'git' user on your source system has SSH access to the target
+server you can send the repositories over the network with rsync.
+
+```
+# As the 'git' user
+rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
+ git@newserver:/mnt/gitlab/repositories
+```
+
+## Thousands of Git repositories: use one rsync per repository
+
+Every time you start an rsync job it has to inspect all files in
+the source directory, all files in the target directory, and then
+decide what files to copy or not. If the source or target directory
+has many contents this startup phase of rsync can become a burden
+for your GitLab server. In cases like this you can make rsync's
+life easier by dividing its work in smaller pieces, and sync one
+repository at a time.
+
+In addition to rsync we will use [GNU
+Parallel](http://www.gnu.org/software/parallel/). This utility is
+not included in GitLab so you need to install it yourself with apt
+or yum. Also note that the GitLab scripts we used below were added
+in GitLab 8.???.
+