summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.mdwn34
-rwxr-xr-xbackup-snapshot249
-rwxr-xr-xbaserock_backup/backup.sh25
-rw-r--r--baserock_backup/instance-config.yml29
-rw-r--r--baserock_backup/ssh_config4
-rw-r--r--baserock_gerrit/backup-snapshot.conf5
-rw-r--r--baserock_gerrit/instance-backup-config.yml29
-rw-r--r--database/backup-snapshot.conf4
-rw-r--r--database/instance-backup-config.yml26
-rw-r--r--frontend/instance-backup-config.yml23
10 files changed, 407 insertions, 21 deletions
diff --git a/README.mdwn b/README.mdwn
index ecf902a1..4a8a1635 100644
--- a/README.mdwn
+++ b/README.mdwn
@@ -53,27 +53,19 @@ To run an ad-hoc command (upgrading, for example):
Backups
-------
-The database server doesn't yet have automated backups running. You can
-manually take a backup like this:
-
- sudo systemctl stop mariadb.service
- sudo lvcreate \
- --name database-backup-20150126 \
- --snapshot /dev/vg0/database \
- --extents 100%ORIGIN \
- --permission=r
- sudo systemctl start mariadb.service
- sudo mount /dev/vg0/database-backup-20150126 /mnt
- # use your preferred backup tool (`rsync` is recommended) to extract the
- # contents of /mnt somewhere safe.
- sudo umount /dev/vg0/database-backup-20150126
- sudo lvremove /dev/vg0/database-backup-20150126
-
-The Gerrit instance stores the Gerrit site path on an LVM volume and can be
-manually backed up in exactly the same way.
-
-git.baserock.org has automated backups of /home and /etc, which are run by
-Codethink to an internal Codethink server.
+Backups of git.baserock.org's data volume are run by and stored on on a
+Codethink-managed machine named 'access'. They will need to migrate off this
+system before long. The backups are taken without pausing services or
+snapshotting the data, so they will not be 100% clean. The current
+git.baserock.org data volume does not use LVM and cannot be easily snapshotted.
+
+Backups of 'gerrit' and 'database' are handled by the
+'baserock_backup/backup.py' script. This currently runs on an instance in
+Codethink's internal OpenStack cloud.
+
+Instances themselves are not backed up. In the event of a crisis we will
+redeploy them from the infrastructure.git repository. There should be nothing
+valuable stored outside of the data volumes that are backed up.
Deployment with Packer
diff --git a/backup-snapshot b/backup-snapshot
new file mode 100755
index 00000000..ce9ae88f
--- /dev/null
+++ b/backup-snapshot
@@ -0,0 +1,249 @@
+#!/usr/bin/python
+#
+# Copyright (C) 2015 Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program. If not, see <http://www.gnu.org/licenses/>.
+
+
+'''Create a temporary backup snapshot of a volume.
+
+This program is intended as a wrapper for `rsync`, to allow copying data out
+of the system with a minimum of service downtime. You can't copy data from a
+volume used by a service like MariaDB or Gerrit while that service is running,
+because the contents will change underneath your feet while you copy them. This
+script assumes the data is stored on an LVM volume, so you can stop the
+services, snapshot the volume, start the services again and then copy the data
+out from the snapshot.
+
+To use it, you need to use the 'command' feature of the .ssh/authorized_keys
+file, which causes OpenSSH to run a given command whenever a given SSH key
+connects (instead of allowing the owner of the key to run any command). This
+ensures that even if the backup key is compromised, all the attacker can do is
+make backups, and only then if they are connecting from the IP listed in 'from'
+
+ command=/usr/bin/backup-snapshot <key details>
+
+You'll need to create a YAML configuration file in /etc/backup-snapshot.conf
+that describes how to create the snapshot. Here's an example:
+
+ services:
+ - lorry-controller-minion@1.service
+ - gerrit.service
+
+ volume: /dev/vg0/gerrit
+
+To test this out, run:
+
+ rsync root@192.168.0.1: /srv/backup --rsync-path="/usr/bin/backup-snapshot"
+
+There is a Perl script named 'rrsync' that does something similar:
+
+ http://git.baserock.org/cgi-bin/cgit.cgi/delta/rsync.git/tree/support/rrsync
+
+'''
+
+
+import contextlib
+import logging
+import os
+import signal
+import shlex
+import subprocess
+import sys
+import tempfile
+import time
+import traceback
+import yaml
+
+
+CONFIG_FILE = '/etc/backup-snapshot.conf'
+
+
+def status(msg, *format):
+ # Messages have to go on stderr because rsync communicates on stdout.
+ logging.info(msg, *format)
+ sys.stderr.write(msg % format + '\n')
+
+
+def run_command(argv):
+ '''Run a command, raising an exception on failure.
+
+ Output on stdout is returned.
+ '''
+ logging.debug("Running: %s", argv)
+ output = subprocess.check_output(argv, close_fds=True)
+
+ logging.debug("Output: %s", output)
+ return output
+
+
+@contextlib.contextmanager
+def pause_services(services):
+ '''Stop a set of systemd services for the duration of a 'with' block.'''
+
+ logging.info("Pausing services: %s", services)
+ try:
+ for service in services:
+ run_command(['systemctl', 'stop', service])
+ yield
+ finally:
+ for service in services:
+ run_command(['systemctl', 'start', service])
+ logging.info("Restarted services: %s", services)
+
+
+def snapshot_volume(volume_path, suffix=None):
+ '''Create a snapshot of an LVM volume.'''
+
+ volume_group_path, volume_name = os.path.split(volume_path)
+
+ if suffix is None:
+ suffix = time.strftime('-backup-%Y-%m-%d')
+ snapshot_name = volume_name + suffix
+
+ logging.info("Snapshotting volume %s as %s", volume_path, snapshot_name)
+ run_command(['lvcreate', '--name', snapshot_name, '--snapshot', volume_path, '--extents', '100%ORIGIN', '--permission=r'])
+
+ snapshot_path = os.path.join(volume_group_path, snapshot_name)
+ return snapshot_path
+
+
+def delete_volume(volume_path):
+ '''Delete an LVM volume or snapshot.'''
+
+ # Sadly, --force seems necessary, because activation applies to the whole
+ # volume group rather than to the individual volumes so we can't deactivate
+ # only the snapshot before removing it.
+ logging.info("Deleting volume %s", volume_path)
+ run_command(['lvremove', '--force', volume_path])
+
+
+@contextlib.contextmanager
+def mount(block_device, path=None):
+ '''Mount a block device for the duration of 'with' block.'''
+
+ if path is None:
+ path = tempfile.mkdtemp()
+ tempdir = path
+ logging.debug('Created temporary directory %s', tempdir)
+ else:
+ tempdir = None
+
+ try:
+ run_command(['mount', block_device, path])
+ try:
+ yield path
+ finally:
+ run_command(['umount', path])
+ finally:
+ if tempdir is not None:
+ logging.debug('Removed temporary directory %s', tempdir)
+ os.rmdir(tempdir)
+
+
+def load_config(filename):
+ '''Load configuration from a YAML file.'''
+
+ logging.info("Loading config from %s", filename)
+ with open(filename, 'r') as f:
+ config = yaml.safe_load(f)
+
+ logging.debug("Config: %s", config)
+ return config
+
+
+def get_rsync_sender_flag(rsync_commandline):
+ '''Parse an 'rsync --server' commandline to get the --sender ID.
+
+ This parses a remote commandline, so be careful.
+
+ '''
+ args = shlex.split(rsync_commandline)
+ if args[0] != 'rsync':
+ raise RuntimeError("Not passed an rsync commandline.")
+
+ for i, arg in enumerate(args):
+ if arg == '--sender':
+ sender = args[i + 1]
+ return sender
+ else:
+ raise RuntimeError("Did not find --sender flag.")
+
+
+def run_rsync_server(source_path, sender_flag):
+ # Adding '/' to the source_path tells rsync that we want the /contents/
+ # of that directory, not the directory itself.
+ #
+ # You'll have realised that it doesn't actually matter what remote path the
+ # user passes to their local rsync.
+ rsync_command = ['rsync', '--server', '--sender', sender_flag, '.',
+ source_path + '/']
+ logging.debug("Running: %s", rsync_command)
+ subprocess.check_call(rsync_command, stdout=sys.stdout)
+
+
+def main():
+ logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s',
+ datefmt='%Y-%m-%d %H:%M:%S',
+ filename='/var/log/backup-snapshot.log',
+ level=logging.DEBUG)
+
+ logging.debug("Running as UID %i GID %i", os.getuid(), os.getgid())
+
+ # Ensure that clean up code (various 'finally' blocks in the functions
+ # above) always runs. This is important to ensure we never leave services
+ # stopped if the process is interrupted somehow.
+
+ signal.signal(signal.SIGHUP, signal.default_int_handler)
+
+ config = load_config(CONFIG_FILE)
+
+ # Check commandline early, so we don't stop services just to then
+ # give an error message.
+ rsync_command = os.environ.get('SSH_ORIGINAL_COMMAND', '')
+ logging.info("Original SSH command: %s", rsync_command)
+
+ if len(rsync_command) == 0:
+ # For testing only -- this can only happen if
+ # ~/.ssh/authorized_keys isn't set up as described above.
+ logging.info("Command line: %s", sys.argv)
+ rsync_command = 'rsync ' + ' '.join(sys.argv[1:])
+
+ # We want to ignore as much as possible of the
+ # SSH_ORIGINAL_COMMAND, because it's a potential attack vector.
+ # If an attacker has somehow got hold of the backup SSH key,
+ # they can pass whatever they want, so we hardcode the 'rsync'
+ # commandline here instead of honouring what the user passed
+ # in. We can anticipate everything except the '--sender' flag.
+ sender_flag = get_rsync_sender_flag(rsync_command)
+
+ with pause_services(config['services']):
+ snapshot_path = snapshot_volume(config['volume'])
+
+ try:
+ with mount(snapshot_path) as mount_path:
+ run_rsync_server(mount_path, sender_flag))
+
+ status("rsync server process exited with success.")
+ finally:
+ delete_volume(snapshot_path)
+
+
+try:
+ status('backup-snapshot started')
+ main()
+except RuntimeError as e:
+ sys.stderr.write('ERROR: %s' % e)
+except Exception as e:
+ logging.debug(traceback.format_exc())
+ raise
diff --git a/baserock_backup/backup.sh b/baserock_backup/backup.sh
new file mode 100755
index 00000000..f16ba447
--- /dev/null
+++ b/baserock_backup/backup.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+# These aren't normal invocations of rsync: the targets use the
+# 'command' option in /root/.ssh/authorized_keys to force execution of
+# the 'backup-snapshot' script at the remote end, which then starts the
+# rsync server process. So the backup SSH key can only be used to make
+# backups, nothing more.
+
+# Don't make the mistake of trying to run this from a systemd unit. There is
+# some brokenness in systemd that causes the SSH connection forwarding to not
+# work, so you will not be able to connect to the remote machines.
+
+# Database
+/usr/bin/rsync --archive --delete-before --delete-excluded \
+ --hard-links --human-readable --progress --sparse \
+ root@192.168.222.30: /srv/backup/database
+date > /srv/backup/database.timestamp
+
+# Gerrit
+/usr/bin/rsync --archive --delete-before --delete-excluded \
+ --hard-links --human-readable --progress --sparse \
+ --exclude='cache/' --exclude='tmp/' \
+ root@192.168.222.69: /srv/backup/gerrit
+date > /srv/backup/gerrit.timestamp
+
diff --git a/baserock_backup/instance-config.yml b/baserock_backup/instance-config.yml
new file mode 100644
index 00000000..327b84e9
--- /dev/null
+++ b/baserock_backup/instance-config.yml
@@ -0,0 +1,29 @@
+# Configuration for a machine that runs data backups of baserock.org.
+#
+# The current backup machine is not a reproducible deployment, but this
+# playbook should be easily adaptable to produce a properly reproducible
+# one.
+---
+- hosts: baserock-backup1
+ gather_facts: false
+ tasks:
+ - name: user for running backups
+ user: name=backup
+
+ # You'll need to copy in the SSH key manually for this user.
+
+ - name: SSH config for backup user
+ copy: src=ssh_config dest=/home/backup/.ssh/config
+
+ - name: backup script
+ copy: src=backup.sh dest=/home/backup/backup.sh mode=755
+
+ # You will need https://github.com/ansible/ansible-modules-core/pull/986
+ # for this to work.
+ - name: backup cron job, runs every day at midnight
+ cron:
+ hour: 00
+ minute: 00
+ job: /home/backup/backup.sh
+ name: baserock.org data backup
+ user: backup
diff --git a/baserock_backup/ssh_config b/baserock_backup/ssh_config
new file mode 100644
index 00000000..e14b38a0
--- /dev/null
+++ b/baserock_backup/ssh_config
@@ -0,0 +1,4 @@
+# SSH configuration to route all requests to baserock.org systems
+# via the frontend system, 185.43.218.170.
+Host 192.168.222.*
+ ProxyCommand ssh backup@185.43.218.170 -W %h:%p
diff --git a/baserock_gerrit/backup-snapshot.conf b/baserock_gerrit/backup-snapshot.conf
new file mode 100644
index 00000000..e8e2f3fc
--- /dev/null
+++ b/baserock_gerrit/backup-snapshot.conf
@@ -0,0 +1,5 @@
+services:
+ - lorry-controller-minion@1.service
+ - gerrit.service
+
+volume: /dev/vg0/gerrit
diff --git a/baserock_gerrit/instance-backup-config.yml b/baserock_gerrit/instance-backup-config.yml
new file mode 100644
index 00000000..60434b5d
--- /dev/null
+++ b/baserock_gerrit/instance-backup-config.yml
@@ -0,0 +1,29 @@
+# Instance backup configuration for the baserock.org Gerrit system.
+---
+- hosts: gerrit
+ gather_facts: false
+ vars:
+ FRONTEND_IP: 192.168.222.21
+ tasks:
+ - name: backup-snapshot script
+ copy: src=../backup-snapshot dest=/usr/bin/backup-snapshot mode=755
+
+ - name: backup-snapshot config
+ copy: src=backup-snapshot.conf dest=/etc/backup-snapshot.conf
+
+ # Would be good to limit this to 'backup' user.
+ - name: passwordless sudo
+ lineinfile: dest=/etc/sudoers state=present line='%wheel ALL=(ALL) NOPASSWD:ALL' validate='visudo -cf %s'
+
+ # We need to give the backup automation 'root' access, because it needs to
+ # manage system services, LVM volumes, and mounts, and because it needs to
+ # be able to read private data. The risk of having the backup key
+ # compromised is mitigated by only allowing it to execute the
+ # 'backup-snapshot' script, and limiting the hosts it can be used from.
+ - name: access for backup SSH key
+ authorized_key:
+ user: root
+ key: "{{ lookup('file', '../keys/backup.key.pub') }}"
+ # Quotes are important in this options, the OpenSSH server will reject
+ # the entry if the 'from' or 'command' values are not quoted.
+ key_options: 'from="{{FRONTEND_IP}}",no-agent-forwarding,no-port-forwarding,no-X11-forwarding,command="/usr/bin/backup-snapshot"'
diff --git a/database/backup-snapshot.conf b/database/backup-snapshot.conf
new file mode 100644
index 00000000..cb3a2ff0
--- /dev/null
+++ b/database/backup-snapshot.conf
@@ -0,0 +1,4 @@
+services:
+ - mariadb.service
+
+volume: /dev/vg0/database
diff --git a/database/instance-backup-config.yml b/database/instance-backup-config.yml
new file mode 100644
index 00000000..79e5ff6c
--- /dev/null
+++ b/database/instance-backup-config.yml
@@ -0,0 +1,26 @@
+# Instance backup configuration for the baserock.org database.
+---
+- hosts: database-mariadb
+ gather_facts: false
+ sudo: yes
+ vars:
+ FRONTEND_IP: 192.168.222.21
+ tasks:
+ - name: backup-snapshot script
+ copy: src=../backup-snapshot dest=/usr/bin/backup-snapshot mode=755
+
+ - name: backup-snapshot config
+ copy: src=backup-snapshot.conf dest=/etc/backup-snapshot.conf
+
+ # We need to give the backup automation 'root' access, because it needs to
+ # manage system services, LVM volumes, and mounts, and because it needs to
+ # be able to read private data. The risk of having the backup key
+ # compromised is mitigated by only allowing it to execute the
+ # 'backup-snapshot' script, and limiting the hosts it can be used from.
+ - name: access for backup SSH key
+ authorized_key:
+ user: root
+ key: "{{ lookup('file', '../keys/backup.key.pub') }}"
+ # Quotes are important in this options, the OpenSSH server will reject
+ # the entry if the 'from' or 'command' values are not quoted.
+ key_options: 'from="{{FRONTEND_IP}}",no-agent-forwarding,no-port-forwarding,no-X11-forwarding,command="/usr/bin/backup-snapshot"'
diff --git a/frontend/instance-backup-config.yml b/frontend/instance-backup-config.yml
new file mode 100644
index 00000000..8f7ca550
--- /dev/null
+++ b/frontend/instance-backup-config.yml
@@ -0,0 +1,23 @@
+# Instance backup configuration for the baserock.org frontend system.
+#
+# We don't need to back anything up from this system, but the backup
+# SSH key needs access to it in order to SSH to the other systems on the
+# internal network.
+---
+- hosts: frontend-haproxy
+ gather_facts: false
+ sudo: yes
+ vars:
+ # The 'backup' key cannot be used to SSH into the 'frontend' machine except
+ # from this IP.
+ PERMITTED_BACKUP_HOSTS: 82.70.136.246/32
+ tasks:
+ - name: backup user
+ user:
+ name: backup
+
+ - name: authorize backup public key
+ authorized_key:
+ user: backup
+ key: "{{ lookup('file', '../keys/backup.key.pub') }}"
+ key_options: 'from="{{ PERMITTED_BACKUP_HOSTS }}",no-agent-forwarding,no-X11-forwarding'