diff options
-rw-r--r-- | README.mdwn | 34 | ||||
-rwxr-xr-x | backup-snapshot | 249 | ||||
-rwxr-xr-x | baserock_backup/backup.sh | 25 | ||||
-rw-r--r-- | baserock_backup/instance-config.yml | 29 | ||||
-rw-r--r-- | baserock_backup/ssh_config | 4 | ||||
-rw-r--r-- | baserock_gerrit/backup-snapshot.conf | 5 | ||||
-rw-r--r-- | baserock_gerrit/instance-backup-config.yml | 29 | ||||
-rw-r--r-- | database/backup-snapshot.conf | 4 | ||||
-rw-r--r-- | database/instance-backup-config.yml | 26 | ||||
-rw-r--r-- | frontend/instance-backup-config.yml | 23 |
10 files changed, 407 insertions, 21 deletions
diff --git a/README.mdwn b/README.mdwn index ecf902a1..4a8a1635 100644 --- a/README.mdwn +++ b/README.mdwn @@ -53,27 +53,19 @@ To run an ad-hoc command (upgrading, for example): Backups ------- -The database server doesn't yet have automated backups running. You can -manually take a backup like this: - - sudo systemctl stop mariadb.service - sudo lvcreate \ - --name database-backup-20150126 \ - --snapshot /dev/vg0/database \ - --extents 100%ORIGIN \ - --permission=r - sudo systemctl start mariadb.service - sudo mount /dev/vg0/database-backup-20150126 /mnt - # use your preferred backup tool (`rsync` is recommended) to extract the - # contents of /mnt somewhere safe. - sudo umount /dev/vg0/database-backup-20150126 - sudo lvremove /dev/vg0/database-backup-20150126 - -The Gerrit instance stores the Gerrit site path on an LVM volume and can be -manually backed up in exactly the same way. - -git.baserock.org has automated backups of /home and /etc, which are run by -Codethink to an internal Codethink server. +Backups of git.baserock.org's data volume are run by and stored on on a +Codethink-managed machine named 'access'. They will need to migrate off this +system before long. The backups are taken without pausing services or +snapshotting the data, so they will not be 100% clean. The current +git.baserock.org data volume does not use LVM and cannot be easily snapshotted. + +Backups of 'gerrit' and 'database' are handled by the +'baserock_backup/backup.py' script. This currently runs on an instance in +Codethink's internal OpenStack cloud. + +Instances themselves are not backed up. In the event of a crisis we will +redeploy them from the infrastructure.git repository. There should be nothing +valuable stored outside of the data volumes that are backed up. Deployment with Packer diff --git a/backup-snapshot b/backup-snapshot new file mode 100755 index 00000000..ce9ae88f --- /dev/null +++ b/backup-snapshot @@ -0,0 +1,249 @@ +#!/usr/bin/python +# +# Copyright (C) 2015 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program. If not, see <http://www.gnu.org/licenses/>. + + +'''Create a temporary backup snapshot of a volume. + +This program is intended as a wrapper for `rsync`, to allow copying data out +of the system with a minimum of service downtime. You can't copy data from a +volume used by a service like MariaDB or Gerrit while that service is running, +because the contents will change underneath your feet while you copy them. This +script assumes the data is stored on an LVM volume, so you can stop the +services, snapshot the volume, start the services again and then copy the data +out from the snapshot. + +To use it, you need to use the 'command' feature of the .ssh/authorized_keys +file, which causes OpenSSH to run a given command whenever a given SSH key +connects (instead of allowing the owner of the key to run any command). This +ensures that even if the backup key is compromised, all the attacker can do is +make backups, and only then if they are connecting from the IP listed in 'from' + + command=/usr/bin/backup-snapshot <key details> + +You'll need to create a YAML configuration file in /etc/backup-snapshot.conf +that describes how to create the snapshot. Here's an example: + + services: + - lorry-controller-minion@1.service + - gerrit.service + + volume: /dev/vg0/gerrit + +To test this out, run: + + rsync root@192.168.0.1: /srv/backup --rsync-path="/usr/bin/backup-snapshot" + +There is a Perl script named 'rrsync' that does something similar: + + http://git.baserock.org/cgi-bin/cgit.cgi/delta/rsync.git/tree/support/rrsync + +''' + + +import contextlib +import logging +import os +import signal +import shlex +import subprocess +import sys +import tempfile +import time +import traceback +import yaml + + +CONFIG_FILE = '/etc/backup-snapshot.conf' + + +def status(msg, *format): + # Messages have to go on stderr because rsync communicates on stdout. + logging.info(msg, *format) + sys.stderr.write(msg % format + '\n') + + +def run_command(argv): + '''Run a command, raising an exception on failure. + + Output on stdout is returned. + ''' + logging.debug("Running: %s", argv) + output = subprocess.check_output(argv, close_fds=True) + + logging.debug("Output: %s", output) + return output + + +@contextlib.contextmanager +def pause_services(services): + '''Stop a set of systemd services for the duration of a 'with' block.''' + + logging.info("Pausing services: %s", services) + try: + for service in services: + run_command(['systemctl', 'stop', service]) + yield + finally: + for service in services: + run_command(['systemctl', 'start', service]) + logging.info("Restarted services: %s", services) + + +def snapshot_volume(volume_path, suffix=None): + '''Create a snapshot of an LVM volume.''' + + volume_group_path, volume_name = os.path.split(volume_path) + + if suffix is None: + suffix = time.strftime('-backup-%Y-%m-%d') + snapshot_name = volume_name + suffix + + logging.info("Snapshotting volume %s as %s", volume_path, snapshot_name) + run_command(['lvcreate', '--name', snapshot_name, '--snapshot', volume_path, '--extents', '100%ORIGIN', '--permission=r']) + + snapshot_path = os.path.join(volume_group_path, snapshot_name) + return snapshot_path + + +def delete_volume(volume_path): + '''Delete an LVM volume or snapshot.''' + + # Sadly, --force seems necessary, because activation applies to the whole + # volume group rather than to the individual volumes so we can't deactivate + # only the snapshot before removing it. + logging.info("Deleting volume %s", volume_path) + run_command(['lvremove', '--force', volume_path]) + + +@contextlib.contextmanager +def mount(block_device, path=None): + '''Mount a block device for the duration of 'with' block.''' + + if path is None: + path = tempfile.mkdtemp() + tempdir = path + logging.debug('Created temporary directory %s', tempdir) + else: + tempdir = None + + try: + run_command(['mount', block_device, path]) + try: + yield path + finally: + run_command(['umount', path]) + finally: + if tempdir is not None: + logging.debug('Removed temporary directory %s', tempdir) + os.rmdir(tempdir) + + +def load_config(filename): + '''Load configuration from a YAML file.''' + + logging.info("Loading config from %s", filename) + with open(filename, 'r') as f: + config = yaml.safe_load(f) + + logging.debug("Config: %s", config) + return config + + +def get_rsync_sender_flag(rsync_commandline): + '''Parse an 'rsync --server' commandline to get the --sender ID. + + This parses a remote commandline, so be careful. + + ''' + args = shlex.split(rsync_commandline) + if args[0] != 'rsync': + raise RuntimeError("Not passed an rsync commandline.") + + for i, arg in enumerate(args): + if arg == '--sender': + sender = args[i + 1] + return sender + else: + raise RuntimeError("Did not find --sender flag.") + + +def run_rsync_server(source_path, sender_flag): + # Adding '/' to the source_path tells rsync that we want the /contents/ + # of that directory, not the directory itself. + # + # You'll have realised that it doesn't actually matter what remote path the + # user passes to their local rsync. + rsync_command = ['rsync', '--server', '--sender', sender_flag, '.', + source_path + '/'] + logging.debug("Running: %s", rsync_command) + subprocess.check_call(rsync_command, stdout=sys.stdout) + + +def main(): + logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', + datefmt='%Y-%m-%d %H:%M:%S', + filename='/var/log/backup-snapshot.log', + level=logging.DEBUG) + + logging.debug("Running as UID %i GID %i", os.getuid(), os.getgid()) + + # Ensure that clean up code (various 'finally' blocks in the functions + # above) always runs. This is important to ensure we never leave services + # stopped if the process is interrupted somehow. + + signal.signal(signal.SIGHUP, signal.default_int_handler) + + config = load_config(CONFIG_FILE) + + # Check commandline early, so we don't stop services just to then + # give an error message. + rsync_command = os.environ.get('SSH_ORIGINAL_COMMAND', '') + logging.info("Original SSH command: %s", rsync_command) + + if len(rsync_command) == 0: + # For testing only -- this can only happen if + # ~/.ssh/authorized_keys isn't set up as described above. + logging.info("Command line: %s", sys.argv) + rsync_command = 'rsync ' + ' '.join(sys.argv[1:]) + + # We want to ignore as much as possible of the + # SSH_ORIGINAL_COMMAND, because it's a potential attack vector. + # If an attacker has somehow got hold of the backup SSH key, + # they can pass whatever they want, so we hardcode the 'rsync' + # commandline here instead of honouring what the user passed + # in. We can anticipate everything except the '--sender' flag. + sender_flag = get_rsync_sender_flag(rsync_command) + + with pause_services(config['services']): + snapshot_path = snapshot_volume(config['volume']) + + try: + with mount(snapshot_path) as mount_path: + run_rsync_server(mount_path, sender_flag)) + + status("rsync server process exited with success.") + finally: + delete_volume(snapshot_path) + + +try: + status('backup-snapshot started') + main() +except RuntimeError as e: + sys.stderr.write('ERROR: %s' % e) +except Exception as e: + logging.debug(traceback.format_exc()) + raise diff --git a/baserock_backup/backup.sh b/baserock_backup/backup.sh new file mode 100755 index 00000000..f16ba447 --- /dev/null +++ b/baserock_backup/backup.sh @@ -0,0 +1,25 @@ +#!/bin/sh + +# These aren't normal invocations of rsync: the targets use the +# 'command' option in /root/.ssh/authorized_keys to force execution of +# the 'backup-snapshot' script at the remote end, which then starts the +# rsync server process. So the backup SSH key can only be used to make +# backups, nothing more. + +# Don't make the mistake of trying to run this from a systemd unit. There is +# some brokenness in systemd that causes the SSH connection forwarding to not +# work, so you will not be able to connect to the remote machines. + +# Database +/usr/bin/rsync --archive --delete-before --delete-excluded \ + --hard-links --human-readable --progress --sparse \ + root@192.168.222.30: /srv/backup/database +date > /srv/backup/database.timestamp + +# Gerrit +/usr/bin/rsync --archive --delete-before --delete-excluded \ + --hard-links --human-readable --progress --sparse \ + --exclude='cache/' --exclude='tmp/' \ + root@192.168.222.69: /srv/backup/gerrit +date > /srv/backup/gerrit.timestamp + diff --git a/baserock_backup/instance-config.yml b/baserock_backup/instance-config.yml new file mode 100644 index 00000000..327b84e9 --- /dev/null +++ b/baserock_backup/instance-config.yml @@ -0,0 +1,29 @@ +# Configuration for a machine that runs data backups of baserock.org. +# +# The current backup machine is not a reproducible deployment, but this +# playbook should be easily adaptable to produce a properly reproducible +# one. +--- +- hosts: baserock-backup1 + gather_facts: false + tasks: + - name: user for running backups + user: name=backup + + # You'll need to copy in the SSH key manually for this user. + + - name: SSH config for backup user + copy: src=ssh_config dest=/home/backup/.ssh/config + + - name: backup script + copy: src=backup.sh dest=/home/backup/backup.sh mode=755 + + # You will need https://github.com/ansible/ansible-modules-core/pull/986 + # for this to work. + - name: backup cron job, runs every day at midnight + cron: + hour: 00 + minute: 00 + job: /home/backup/backup.sh + name: baserock.org data backup + user: backup diff --git a/baserock_backup/ssh_config b/baserock_backup/ssh_config new file mode 100644 index 00000000..e14b38a0 --- /dev/null +++ b/baserock_backup/ssh_config @@ -0,0 +1,4 @@ +# SSH configuration to route all requests to baserock.org systems +# via the frontend system, 185.43.218.170. +Host 192.168.222.* + ProxyCommand ssh backup@185.43.218.170 -W %h:%p diff --git a/baserock_gerrit/backup-snapshot.conf b/baserock_gerrit/backup-snapshot.conf new file mode 100644 index 00000000..e8e2f3fc --- /dev/null +++ b/baserock_gerrit/backup-snapshot.conf @@ -0,0 +1,5 @@ +services: + - lorry-controller-minion@1.service + - gerrit.service + +volume: /dev/vg0/gerrit diff --git a/baserock_gerrit/instance-backup-config.yml b/baserock_gerrit/instance-backup-config.yml new file mode 100644 index 00000000..60434b5d --- /dev/null +++ b/baserock_gerrit/instance-backup-config.yml @@ -0,0 +1,29 @@ +# Instance backup configuration for the baserock.org Gerrit system. +--- +- hosts: gerrit + gather_facts: false + vars: + FRONTEND_IP: 192.168.222.21 + tasks: + - name: backup-snapshot script + copy: src=../backup-snapshot dest=/usr/bin/backup-snapshot mode=755 + + - name: backup-snapshot config + copy: src=backup-snapshot.conf dest=/etc/backup-snapshot.conf + + # Would be good to limit this to 'backup' user. + - name: passwordless sudo + lineinfile: dest=/etc/sudoers state=present line='%wheel ALL=(ALL) NOPASSWD:ALL' validate='visudo -cf %s' + + # We need to give the backup automation 'root' access, because it needs to + # manage system services, LVM volumes, and mounts, and because it needs to + # be able to read private data. The risk of having the backup key + # compromised is mitigated by only allowing it to execute the + # 'backup-snapshot' script, and limiting the hosts it can be used from. + - name: access for backup SSH key + authorized_key: + user: root + key: "{{ lookup('file', '../keys/backup.key.pub') }}" + # Quotes are important in this options, the OpenSSH server will reject + # the entry if the 'from' or 'command' values are not quoted. + key_options: 'from="{{FRONTEND_IP}}",no-agent-forwarding,no-port-forwarding,no-X11-forwarding,command="/usr/bin/backup-snapshot"' diff --git a/database/backup-snapshot.conf b/database/backup-snapshot.conf new file mode 100644 index 00000000..cb3a2ff0 --- /dev/null +++ b/database/backup-snapshot.conf @@ -0,0 +1,4 @@ +services: + - mariadb.service + +volume: /dev/vg0/database diff --git a/database/instance-backup-config.yml b/database/instance-backup-config.yml new file mode 100644 index 00000000..79e5ff6c --- /dev/null +++ b/database/instance-backup-config.yml @@ -0,0 +1,26 @@ +# Instance backup configuration for the baserock.org database. +--- +- hosts: database-mariadb + gather_facts: false + sudo: yes + vars: + FRONTEND_IP: 192.168.222.21 + tasks: + - name: backup-snapshot script + copy: src=../backup-snapshot dest=/usr/bin/backup-snapshot mode=755 + + - name: backup-snapshot config + copy: src=backup-snapshot.conf dest=/etc/backup-snapshot.conf + + # We need to give the backup automation 'root' access, because it needs to + # manage system services, LVM volumes, and mounts, and because it needs to + # be able to read private data. The risk of having the backup key + # compromised is mitigated by only allowing it to execute the + # 'backup-snapshot' script, and limiting the hosts it can be used from. + - name: access for backup SSH key + authorized_key: + user: root + key: "{{ lookup('file', '../keys/backup.key.pub') }}" + # Quotes are important in this options, the OpenSSH server will reject + # the entry if the 'from' or 'command' values are not quoted. + key_options: 'from="{{FRONTEND_IP}}",no-agent-forwarding,no-port-forwarding,no-X11-forwarding,command="/usr/bin/backup-snapshot"' diff --git a/frontend/instance-backup-config.yml b/frontend/instance-backup-config.yml new file mode 100644 index 00000000..8f7ca550 --- /dev/null +++ b/frontend/instance-backup-config.yml @@ -0,0 +1,23 @@ +# Instance backup configuration for the baserock.org frontend system. +# +# We don't need to back anything up from this system, but the backup +# SSH key needs access to it in order to SSH to the other systems on the +# internal network. +--- +- hosts: frontend-haproxy + gather_facts: false + sudo: yes + vars: + # The 'backup' key cannot be used to SSH into the 'frontend' machine except + # from this IP. + PERMITTED_BACKUP_HOSTS: 82.70.136.246/32 + tasks: + - name: backup user + user: + name: backup + + - name: authorize backup public key + authorized_key: + user: backup + key: "{{ lookup('file', '../keys/backup.key.pub') }}" + key_options: 'from="{{ PERMITTED_BACKUP_HOSTS }}",no-agent-forwarding,no-X11-forwarding' |