From 69d6d1a76de7c9f4c1274ada238fe5295fe7dc30 Mon Sep 17 00:00:00 2001
From: Sam Thursfield <sam.thursfield@codethink.co.uk>
Date: Fri, 13 Oct 2017 13:10:15 +0100
Subject: Rename README so it gets displayed in GitLab

---
 README.md   | 584 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 README.mdwn | 584 ------------------------------------------------------------
 2 files changed, 584 insertions(+), 584 deletions(-)
 create mode 100644 README.md
 delete mode 100644 README.mdwn

diff --git a/README.md b/README.md
new file mode 100644
index 00000000..2f4c08d5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,584 @@
+Baserock project public infrastructure
+======================================
+
+This repository contains the definitions for all of the Baserock Project's
+infrastructure. This includes every service used by the project, except for
+the mailing lists (hosted by [Pepperfish]) the wiki (hosted by [Branchable])
+and the GitLab CI runners (set up by Javier Jardón).
+
+Some of these systems are Baserock systems. This has proved an obstacle to
+keeping them up to date with security updates, and we plan to switch everything
+to run on mainstream distros in future.
+
+All files necessary for (re)deploying the systems should be contained in this
+Git repository. Private tokens should be encrypted using
+[ansible-vault](https://www.ansible.com/blog/2014/02/19/ansible-vault).
+
+[Pepperfish]: http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo
+[Branchable]: http://www.branchable.com/
+
+
+General notes
+-------------
+
+When instantiating a machine that will be public, remember to give shell
+access everyone on the ops team. This can be done using a post-creation
+customisation script that injects all of their SSH keys. The SSH public
+keys of the Baserock Operations team are collected in
+`baserock-ops-team.cloud-config.`.
+
+Ensure SSH password login is disabled in all systems you deploy! See:
+<https://testbit.eu/is-ssh-insecure/> for why. The Ansible playbook
+`admin/sshd_config.yaml` can ensure that all systems have password login
+disabled.
+
+
+Administration
+--------------
+
+You can use [Ansible] to automate tasks on the baserock.org systems.
+
+To run a playbook:
+
+    ansible-playbook -i hosts $PLAYBOOK.yaml
+
+To run an ad-hoc command (upgrading, for example):
+
+    ansible -i hosts fedora -m command -a 'sudo dnf update -y'
+    ansible -i hosts ubuntu -m command -a 'sudo apt-get update -y'
+
+[Ansible]: http://www.ansible.com
+
+
+Security updates
+----------------
+
+Fedora security updates can be watched here:
+<https://bodhi.fedoraproject.org/updates/?type=security>. Ubuntu issues
+security advisories here: <http://www.ubuntu.com/usn/>.
+The Baserock reference systems doesn't have such a service. The [LWN
+Alerts](https://lwn.net/Alerts/) service gives you info from all major Linux
+distributions.
+
+If there is a vulnerability discovered in some software we use, we might need
+to upgrade all of the systems that use that component at baserock.org.
+
+Bear in mind some systems are not accessible except via the frontend-haproxy
+system. Those are usually less at risk than those that face the web directly.
+Also bear in mind we use OpenStack security groups to block most ports.
+
+### Prepare the patch for Baserock systems
+
+First, you need to update the Baserock reference system definitions with a
+fixed version of the component. Build that and test that it works. Submit
+the patch to gerrit.baserock.org, get it reviewed, and merged. Then cherry
+pick that patch into infrastructure.git.
+
+This a long-winded process. There are shortcuts you can take, although
+someone still has to complete the process described above at some point.
+
+* You can modify the infrastructure.git definitions directly and start rebuilding
+  the infrastructure systems right away, to avoid waiting for the Baserock patch
+  review process.
+
+* You can add the new version of the component as a stratum that sits above
+  everything else in the build graph. For example, to do a 'hot-fix' for GLIBC,
+  add a 'glibc-hotfix' stratum containing the new version to all of the systems
+  you need to upgrade. Rebuilding them will be quick because you just need to
+  build GLIBC, and can reuse the cached artifacts for everything else. The new
+  GLIBC will overwrite the one that is lower down in the build graph in the
+  resulting filesystem. Of course, if the new version of the component is not
+  ABI compatible then this approach will break things. Be careful.
+
+### Check the inventory
+
+Make sure the Ansible inventory file is up to date, and that you have access to
+all machines. Run this:
+
+    ansible \* -i ./hosts -m ping
+
+You should see lots of this sort of output:
+
+    mail | success >> {
+        "changed": false,
+        "ping": "pong"
+    }
+
+    frontend-haproxy | success >> {
+        "changed": false,
+        "ping": "pong"
+    }
+
+You may find some host key errors like this:
+
+    paste | FAILED => SSH Error: Host key verification failed.
+    It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
+
+If you have a host key problem, that could be because somebody redeployed
+the system since the last time you connected to it with SSH, and did not
+transfer the SSH host keys from the old system to the new system. Check with
+other ops teams members about this. If you are sure the new host keys can
+be trusted, you can remove the old ones with `ssh-keygen -R 192.168.x.y`, where 192.168.x.y is the internal IP address of the machine. You'll then be prompted to accept the new ones when you run Ansible again.
+
+Once all machines respond to the Ansible 'ping' module, double check that
+every machine you can see in the OpenStack Horizon dashboard has a
+corresponding entry in the 'hosts' file, to ensure the next steps operate
+on all of the machines.
+
+### Check and upgrade Fedora systems
+
+> Bear in mind that only the latest 2 versions of Fedora receive security
+updates. If any machines are not running the latest version of Fedora,
+you should redeploy them with the latest version. See the instructions below
+on how to (re)deploy each machine. You should deploy a new instance of a system
+and test it *before* terminating the existing instance. Switching over should
+be a matter of changing either its floating IP address or the IP address in
+baserock_frontend/haproxy.conf.
+
+You can find out what version of Fedora is in use with this command:
+
+    ansible fedora -i hosts -m setup  -a 'filter=ansible_distribution_version'
+
+Check what version of a package is in use with this command (using GLIBC as an
+example). You can compare this against Fedora package changelogs at
+[Koji](https://koji.fedoraproject.org).
+
+    ansible fedora -i hosts -m command -a 'rpm -q glibc --qf "%{VERSION}.%{RELEASE}\n"'
+
+You can see what updates are available using the `dnf updateinfo info' command.
+
+    ansible -i hosts fedora -m command -a 'dnf updateinfo info glibc'
+
+You can then use `dnf upgrade -y` to install all available updates. Or give the
+name of a package to update just that package. Be aware that DNF is quite slow,
+and if you forget to pass `-y` then it will hang forever waiting for input.
+
+You will then need to restart services. The `dnf needs-restarting` command might be
+useful, but rebooting the whole machine is probably easiest.
+
+### Check and upgrade Ubuntu systems
+
+> Bear in mind that only the latest and the latest LTS release of Ubuntu receive any
+security updates.
+
+Find out what version of Ubuntu is in use with this command:
+
+    ansible ubuntu -i hosts -m setup  -a 'filter=ansible_distribution_version'
+
+Check what version of a given package is in use with this command (using GLIBC
+as an example).
+
+    ansible -i hosts ubuntu -m command -a 'dpkg-query --show libc6'
+
+Check for available updates, and what they contain:
+
+    ansible -i hosts ubuntu -m command -a 'apt-cache policy libc6'
+    ansible -i hosts ubuntu -m command -a 'apt-get changelog libc6' | head -n 20
+
+You can update all the packages with:
+
+    ansible -i hosts ubuntu -m command -a 'apt-get upgrade -y' --sudo
+
+You will then need to restart services. Rebooting the machine is probably
+easiest.
+
+### Check and upgrade Baserock systems
+
+Check what version of a given package is in use with this command (using GLIBC
+as an example). Ideally Baserock reference systems would have a query tool for
+this info, but for now we have to look at the JSON metadata file directly.
+
+    ansible -i hosts baserock -m command \
+        -a "grep '\"\(sha1\|repo\|original_ref\)\":' /baserock/glibc-bins.meta"
+
+The default Baserock machine layout uses Btrfs for the root filesystem. Filling
+up a Btrfs disk results in unpredictable behaviour. Before deploying any system
+upgrades, check that each machine has enough free disk space to hold an
+upgrade. Allow for at least 4GB free space, to be safe.
+
+    ansible -i hosts baserock -m command -a "df -h /"
+
+A good way to free up space is to remove old system-versions using the
+`system-version-manager` tool. There may be other things that are
+unnecessarily taking up space in the root file system, too.
+
+Ideally, at this point you've prepared a patch for definitions.git to fix
+the security issue in the Baserock reference systems, and it has been merged.
+In that case, pull from the reference systems into infrastructure.git, using
+`git pull git://git.baserock.org/baserock/baserock/definitions master`.
+
+If the necessary patch isn't merged in definitions.git, it's still best to
+merge 'master' from there into infrastructure.git, and then cherry-pick the
+patch from Gerrit on top.
+
+You then need to build and upgrade the systems one by one. Do this from the
+'devel-system' machine in the same OpenStack cloud that hosts the
+infrastructure. Baserock upgrades currently involve transferring the whole
+multi-gigabyte system image, so you *must* have a fast connection to the
+target.
+
+Each Baserock system has its own deployment instructions. Each should have
+a deployment .morph file that you can pass to `morph upgrade`. For example,
+to deploy an upgrade git.baserock.org:
+
+    morph upgrade --local-changes=ignore \
+        baserock_trove/baserock_trove.morph gbo.VERSION_LABEL=2016-02-19
+
+Once this completes successfully, rebooting the system should bring up the
+new system. You may want to check that the new `/etc` is correct; you can
+do this inside the machine by mounting `/dev/vda` and looking in `systems/$VERSION_LABEL/run/etc`.
+
+If you want to revert the upgrade, use `system-version-manager list` and
+`system-version-manager set-default <old-version>` to set the previous
+version as the default, then reboot. If the system doesn't boot at all,
+reboot it while you have the graphical console open in Horizon, and you
+should be able to press `ESC` fast enough to get the boot menu open. This
+will allow booting into previous versions of the system. (You shouldn't
+have any problems though since of course we test everything regularly).
+
+Beware of <https://storyboard.baserock.org/#!/story/77>.
+
+For cache.baserock.org, you can reuse the deployment instructions for
+git.baserock.org. Try:
+
+    morph upgrade --local-changes=ignore \
+        baserock_trove/baserock_trove.morph \
+        gbo.update-location=root@cache.baserock.org
+        gbo.VERSION_LABEL=2016-02-19
+
+Deployment to OpenStack
+-----------------------
+
+The intention is that all of the systems defined here are deployed to an
+OpenStack cloud. The instructions here harcode some details about the specific
+tenancy at [DataCentred](http://www.datacentred.io) that the Baserock project
+uses. It should be easy to adapt them for other OpenStack hosts, though.
+
+### Credentials
+
+The instructions below assume you have the following environment variables set
+according to the OpenStack host you are deploying to:
+
+ - `OS_AUTH_URL`
+ - `OS_TENANT_NAME`
+ - `OS_USERNAME`
+ - `OS_PASSWORD`
+
+When using `morph deploy` to deploy to OpenStack, you will need to set these
+variables, because currently Morph does not honour the standard ones. See:
+<https://storyboard.baserock.org/#!/story/35>.
+
+ - `OPENSTACK_USER=$OS_USERNAME`
+ - `OPENSTACK_PASSWORD=$OS_PASSWORD`
+ - `OPENSTACK_TENANT=$OS_TENANT_NAME`
+
+The `location` field in the deployment .morph file will also need to point to
+the correct `$OS_AUTH_URL`.
+
+### Firewall / Security Groups
+
+The instructions assume the presence of a set of security groups. You can
+create these by running the following Ansible playbook.
+
+    ansible-playbook -i hosts firewall.yaml
+
+### Placeholders
+
+The commands below use a couple of placeholders like $network_id, you can set
+them in your environment to allow you to copy and paste the commands below
+as-is.
+
+ - `export fedora_image_id=...` (find this with `glance image-list`)
+ - `export network_id=...` (find this with `neutron net-list`)
+ - `export keyname=...` (find this with `nova keypair-list`)
+
+The `$fedora_image_id` should reference a Fedora Cloud image. You can import
+these from <http://www.fedoraproject.org/>. At time of writing, these
+instructions were tested with Fedora Cloud 23 for x86_64.
+
+Backups
+-------
+
+Backups of git.baserock.org's data volume are run by and stored on on a
+Codethink-managed machine named 'access'. They will need to migrate off this
+system before long.  The backups are taken without pausing services or
+snapshotting the data, so they will not be 100% clean. The current
+git.baserock.org data volume does not use LVM and cannot be easily snapshotted.
+
+Backups of 'gerrit' and 'database' are handled by the
+'baserock_backup/backup.py' script. This currently runs on an instance in
+Codethink's internal OpenStack cloud.
+
+Instances themselves are not backed up. In the event of a crisis we will
+redeploy them from the infrastructure.git repository. There should be nothing
+valuable stored outside of the data volumes that are backed up.
+
+To prepare the infrastructure to run the backup scripts you will need to run
+the following playbooks:
+
+    ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml
+    ansible-playbook -i hosts baserock_database/instance-backup-config.yml
+    ansible-playbook -i hosts baserock_gerrit/instance-backup-config.yml
+
+NOTE: to run these playbooks you need to have the public ssh key of the backups
+instance in `keys/backup.key.pub`.
+
+
+Systems
+-------
+
+### Front-end
+
+The front-end provides a reverse proxy, to allow more flexible routing than
+simply pointing each subdomain to a different instance using separate public
+IPs. It also provides a starting point for future load-balancing and failover
+configuration.
+
+To deploy this system:
+
+    nova boot frontend-haproxy \
+        --key-name=$keyname \
+        --flavor=dc1.1x0 \
+        --image=$fedora_image_id \
+        --nic="net-id=$network_id" \
+        --security-groups default,gerrit,shared-artifact-cache,web-server \
+        --user-data ./baserock-ops-team.cloud-config
+    ansible-playbook -i hosts baserock_frontend/image-config.yml
+    ansible-playbook -i hosts baserock_frontend/instance-config.yml
+    ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml
+
+    ansible -i hosts -m service -a 'name=haproxy enabled=true state=started' \
+        --sudo frontend-haproxy
+
+The baserock_frontend system is stateless.
+
+Full HAProxy 1.5 documentation: <https://cbonte.github.io/haproxy-dconv/configuration-1.5.html>.
+
+If you want to add a new service to the Baserock Project infrastructure via
+the frontend, do the following:
+
+- request a subdomain that points at 185.43.218.170 (frontend)
+- alter the haproxy.cfg file in the baserock_frontend/ directory in this repo
+  as necessary to proxy requests to the real instance
+- run the baserock_frontend/instance-config.yml playbook
+- run `ansible -i hosts -m service -a 'name=haproxy enabled=true
+  state=restarted' --sudo frontend-haproxy`
+
+OpenStack doesn't provide any kind of internal DNS service, so you must put the
+fixed IP of each instance.
+
+The internal IP address of this machine is hardcoded in some places (beyond the
+usual haproxy.cfg file), use 'git grep' to find all of them. You'll need to
+update all the relevant config files. We really need some internal DNS system
+to avoid this hassle.
+
+### Trove
+
+To deploy to production, run these commands in a Baserock 'devel'
+or 'build' system.
+
+    nova volume-create \
+        --display-name git.baserock.org-home \
+        --display-description '/home partition of git.baserock.org' \
+        --volume-type Ceph \
+        300
+
+    git clone git://git.baserock.org/baserock/baserock/infrastructure.git
+    cd infrastructure
+
+    morph build systems/trove-system-x86_64.morph
+    morph deploy baserock_trove/baserock_trove.morph
+
+    nova boot git.baserock.org \
+        --key-name $keyname \
+        --flavor 'dc1.8x16' \
+        --image baserock_trove \
+        --nic "net-id=$network_id,v4-fixed-ip=192.168.222.58" \
+        --security-groups default,git-server,web-server,shared-artifact-cache \
+        --user-data baserock-ops-team.cloud-config
+
+    nova volume-attach git.baserock.org <volume-id> /dev/vdb
+
+    # Note, if this floating IP is not available, you will have to change
+    # the DNS in the DNS provider.
+    nova add-floating-ip git.baserock.org 185.43.218.183
+
+    ansible-playbook -i hosts baserock_trove/instance-config.yml
+
+    # Before configuring the Trove you will need to create some ssh
+    # keys for it. You can also use existing keys.
+
+    mkdir private
+    ssh-keygen -N '' -f private/lorry.key
+    ssh-keygen -N '' -f private/worker.key
+    ssh-keygen -N '' -f private/admin.key
+
+    # Now you can finish the configuration of the Trove with:
+
+    ansible-playbook -i hosts baserock_trove/configure-trove.yml
+
+### OSTree artifact cache
+
+To deploy this system to production:
+
+    nova volume-create \
+        --display-name ostree-volume \
+        --display-description 'OSTree cache volume' \
+        --volume-type Ceph \
+        300
+
+    nova boot ostree.baserock.org \
+        --key-name $keyname \
+        --flavor dc1.2x8.40 \
+        --image $fedora_image_id \
+        --nic "net-id=$network_id,v4-fixed-ip=192.168.222.153" \
+        --security-groups default,web-server \
+        --user-data ./baserock-ops-team.cloud-config
+
+    nova volume-attach ostree.baserock.org <volume-id> /dev/vdb
+
+    ansible-playbook -i hosts baserock_ostree/image-config.yml
+    ansible-playbook -i hosts baserock_ostree/instance-config.yml
+    ansible-playbook -i hosts baserock_ostree/ostree-access-config.yml
+
+SSL certificates
+================
+
+The certificates used for our infrastructure are provided for free
+by Let's Encrypt. These certificates expire every 3 months. Here we
+will explain how to renew the certificates, and how to deploy them.
+
+Generation of certificates
+--------------------------
+
+> Note: This should be automated in the next upgrade. The instructions
+> sound like a lot of effort
+
+To generate the SSL certs, first you need to clone the following repositories:
+
+    git clone https://github.com/lukas2511/letsencrypt.sh.git
+    git clone https://github.com/mythic-beasts/letsencrypt-mythic-dns01.git
+
+The version used the first time was `0.4.0` with sha `116386486b3749e4c5e1b4da35904f30f8b2749b`,
+(just in case future releases break these instructions)
+
+Now inside of the repo, create a `domains.txt` file with the information
+of the subdomains:
+
+    cd letsencrypt.sh
+    cat >domains.txt <<'EOF'
+    baserock.org
+    docs.baserock.org download.baserock.org irclogs.baserock.org ostree.baserock.org paste.baserock.org spec.baserock.org
+    git.baserock.org
+    EOF
+
+And the `config` file needed:
+
+    cat >config <<'EOF'
+    CONTACT_EMAIL="admin@baserock.org"
+    HOOK="../letsencrypt-mythic-dns01/letsencrypt-mythic-dns01.sh"
+    CHALLENGETYPE="dns-01"
+    EOF
+
+Create a `dnsapi.config.txt` with the contents of `private/dnsapi.config.txt`
+decrypted. To show the contents of this file, run the following in a
+`infrastructure.git` repo checkout.
+
+    ansible-vault view private/dnsapi.config.txt
+
+Now, to generate the certs, run:
+
+    ./dehydrated -c
+
+> If this is the first time, you will get asked to run
+> `./dehydrated --register --accept-terms`
+
+In the `certs` folder you will have all the certificates generated. To construct the
+certificates that are present in `certs` and `private` you will have to:
+
+    cd certs
+    mkdir -p tmp/private tmp/certs
+
+    # Create some full certs including key for some services that need it this way
+    cat git.baserock.org/cert.csr git.baserock.org/cert.pem git.baserock.org/chain.pem git.baserock.org/privkey.pem > tmp/private/git-with-key.pem
+    cat docs.baserock.org/cert.csr docs.baserock.org/cert.pem docs.baserock.org/chain.pem docs.baserock.org/privkey.pem > tmp/private/frontend-with-key.pem
+
+    # Copy key files
+    cp git.baserock.org/privkey.pem tmp/private/git.pem
+    cp docs.baserock.org/privkey.pem tmp/private/frontend.pem
+
+    # Copy cert files
+    cp git.baserock.org/cert.csr tmp/certs/git.csr
+    cp git.baserock.org/cert.pem tmp/certs/git.pem
+    cp git.baserock.org/chain.pem tmp/certs/git-chain.pem
+    cp docs.baserock.org/cert.csr tmp/certs/frontend.csr
+    cp docs.baserock.org/cert.pem tmp/certs/frontend.pem
+    cp docs.baserock.org/chain.pem tmp/certs/frontend-chain.pem
+
+    # Create full certs without keys
+    cat git.baserock.org/cert.csr git.baserock.org/cert.pem chain.pem > tmp/certs/git-full.pem
+    cat docs.baserock.org/cert.csr docs.baserock.org/cert.pem chain.pem > tmp/certs/frontend-full.pem
+
+Before replacing the current ones, make sure you **encrypt** the ones that contain
+keys (located in `private` folder):
+
+    ansible-vault encrypt tmp/private/*
+
+And copy them to the repo:
+
+    cp tmp/certs/* ../../certs/
+    cp tmp/private/* ../../private/
+
+
+Deploy certificates
+-------------------
+
+For `git.baserock.org` just run:
+
+    ansible-playbook -i hosts baserock_trove/configure-trove.yml
+
+This script will copy the certificates to the Trove and run the scripts
+that will configure them.
+
+For the frontend, run:
+
+    ansible-playbook -i hosts baserock_frontend/instance-config.yml
+    ansible -i hosts -m service -a 'name=haproxy enabled=true state=restarted' --sudo frontend-haproxy
+
+Which will install the certificates and then restart the services needed.
+
+
+GitLab CI runners setup
+=======================
+
+Baserock uses [GitLab CI] for build and test automation. For performance reasons
+we provide our own runners and avoid using the free, shared runners provided by
+GitLab. The runners are hosted at [DigitalOcean] and managed by the 'baserock'
+team account there.
+
+There is a persistent 'manager' machine with a public IP of 138.68.143.2 that
+runs GitLab Runner and [docker-machine]. This doesn't run any builds itself --
+we use the [autoscaling feature] of GitLab Runner to spawn new VMs for building
+in. The configuration for this is in `/etc/gitlab-runner/config.toml`.
+
+Each build occurs in a Docker container on one of the transient VMs. As per
+the [\[runners.docker\] section] of `config.toml`, each gets a newly created
+volume mounted at `/cache`. The YBD and BuildStream cache directories get
+located here because jobs were running out of disk space when using the default
+configuration.
+
+There is a second persistent machine with a public IP of 46.101.48.48 that
+hosts a Docker registry and a [Minio] cache. These services run as Docker
+containers. The Docker registry exists to cache the Docker images we use which
+improves the spin-up time of the transient builder VMs, as documented
+[here](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-docker-registry-mirroring).
+The Minio cache is used for the [distributed caching] feature of GitLab CI.
+
+
+[GitLab CI]: https://about.gitlab.com/features/gitlab-ci-cd/
+[DigitalOcean]: https://cloud.digitalocean.com/
+[docker-machine]: https://docs.docker.com/machine/
+[autoscaling feature]: https://docs.gitlab.com/runner/configuration/autoscale.html
+[Minio]: https://www.minio.io/
+[\[runners.docker\] section]: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section
+[distributed caching]: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching
diff --git a/README.mdwn b/README.mdwn
deleted file mode 100644
index 2f4c08d5..00000000
--- a/README.mdwn
+++ /dev/null
@@ -1,584 +0,0 @@
-Baserock project public infrastructure
-======================================
-
-This repository contains the definitions for all of the Baserock Project's
-infrastructure. This includes every service used by the project, except for
-the mailing lists (hosted by [Pepperfish]) the wiki (hosted by [Branchable])
-and the GitLab CI runners (set up by Javier Jardón).
-
-Some of these systems are Baserock systems. This has proved an obstacle to
-keeping them up to date with security updates, and we plan to switch everything
-to run on mainstream distros in future.
-
-All files necessary for (re)deploying the systems should be contained in this
-Git repository. Private tokens should be encrypted using
-[ansible-vault](https://www.ansible.com/blog/2014/02/19/ansible-vault).
-
-[Pepperfish]: http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo
-[Branchable]: http://www.branchable.com/
-
-
-General notes
--------------
-
-When instantiating a machine that will be public, remember to give shell
-access everyone on the ops team. This can be done using a post-creation
-customisation script that injects all of their SSH keys. The SSH public
-keys of the Baserock Operations team are collected in
-`baserock-ops-team.cloud-config.`.
-
-Ensure SSH password login is disabled in all systems you deploy! See:
-<https://testbit.eu/is-ssh-insecure/> for why. The Ansible playbook
-`admin/sshd_config.yaml` can ensure that all systems have password login
-disabled.
-
-
-Administration
---------------
-
-You can use [Ansible] to automate tasks on the baserock.org systems.
-
-To run a playbook:
-
-    ansible-playbook -i hosts $PLAYBOOK.yaml
-
-To run an ad-hoc command (upgrading, for example):
-
-    ansible -i hosts fedora -m command -a 'sudo dnf update -y'
-    ansible -i hosts ubuntu -m command -a 'sudo apt-get update -y'
-
-[Ansible]: http://www.ansible.com
-
-
-Security updates
-----------------
-
-Fedora security updates can be watched here:
-<https://bodhi.fedoraproject.org/updates/?type=security>. Ubuntu issues
-security advisories here: <http://www.ubuntu.com/usn/>.
-The Baserock reference systems doesn't have such a service. The [LWN
-Alerts](https://lwn.net/Alerts/) service gives you info from all major Linux
-distributions.
-
-If there is a vulnerability discovered in some software we use, we might need
-to upgrade all of the systems that use that component at baserock.org.
-
-Bear in mind some systems are not accessible except via the frontend-haproxy
-system. Those are usually less at risk than those that face the web directly.
-Also bear in mind we use OpenStack security groups to block most ports.
-
-### Prepare the patch for Baserock systems
-
-First, you need to update the Baserock reference system definitions with a
-fixed version of the component. Build that and test that it works. Submit
-the patch to gerrit.baserock.org, get it reviewed, and merged. Then cherry
-pick that patch into infrastructure.git.
-
-This a long-winded process. There are shortcuts you can take, although
-someone still has to complete the process described above at some point.
-
-* You can modify the infrastructure.git definitions directly and start rebuilding
-  the infrastructure systems right away, to avoid waiting for the Baserock patch
-  review process.
-
-* You can add the new version of the component as a stratum that sits above
-  everything else in the build graph. For example, to do a 'hot-fix' for GLIBC,
-  add a 'glibc-hotfix' stratum containing the new version to all of the systems
-  you need to upgrade. Rebuilding them will be quick because you just need to
-  build GLIBC, and can reuse the cached artifacts for everything else. The new
-  GLIBC will overwrite the one that is lower down in the build graph in the
-  resulting filesystem. Of course, if the new version of the component is not
-  ABI compatible then this approach will break things. Be careful.
-
-### Check the inventory
-
-Make sure the Ansible inventory file is up to date, and that you have access to
-all machines. Run this:
-
-    ansible \* -i ./hosts -m ping
-
-You should see lots of this sort of output:
-
-    mail | success >> {
-        "changed": false,
-        "ping": "pong"
-    }
-
-    frontend-haproxy | success >> {
-        "changed": false,
-        "ping": "pong"
-    }
-
-You may find some host key errors like this:
-
-    paste | FAILED => SSH Error: Host key verification failed.
-    It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
-
-If you have a host key problem, that could be because somebody redeployed
-the system since the last time you connected to it with SSH, and did not
-transfer the SSH host keys from the old system to the new system. Check with
-other ops teams members about this. If you are sure the new host keys can
-be trusted, you can remove the old ones with `ssh-keygen -R 192.168.x.y`, where 192.168.x.y is the internal IP address of the machine. You'll then be prompted to accept the new ones when you run Ansible again.
-
-Once all machines respond to the Ansible 'ping' module, double check that
-every machine you can see in the OpenStack Horizon dashboard has a
-corresponding entry in the 'hosts' file, to ensure the next steps operate
-on all of the machines.
-
-### Check and upgrade Fedora systems
-
-> Bear in mind that only the latest 2 versions of Fedora receive security
-updates. If any machines are not running the latest version of Fedora,
-you should redeploy them with the latest version. See the instructions below
-on how to (re)deploy each machine. You should deploy a new instance of a system
-and test it *before* terminating the existing instance. Switching over should
-be a matter of changing either its floating IP address or the IP address in
-baserock_frontend/haproxy.conf.
-
-You can find out what version of Fedora is in use with this command:
-
-    ansible fedora -i hosts -m setup  -a 'filter=ansible_distribution_version'
-
-Check what version of a package is in use with this command (using GLIBC as an
-example). You can compare this against Fedora package changelogs at
-[Koji](https://koji.fedoraproject.org).
-
-    ansible fedora -i hosts -m command -a 'rpm -q glibc --qf "%{VERSION}.%{RELEASE}\n"'
-
-You can see what updates are available using the `dnf updateinfo info' command.
-
-    ansible -i hosts fedora -m command -a 'dnf updateinfo info glibc'
-
-You can then use `dnf upgrade -y` to install all available updates. Or give the
-name of a package to update just that package. Be aware that DNF is quite slow,
-and if you forget to pass `-y` then it will hang forever waiting for input.
-
-You will then need to restart services. The `dnf needs-restarting` command might be
-useful, but rebooting the whole machine is probably easiest.
-
-### Check and upgrade Ubuntu systems
-
-> Bear in mind that only the latest and the latest LTS release of Ubuntu receive any
-security updates.
-
-Find out what version of Ubuntu is in use with this command:
-
-    ansible ubuntu -i hosts -m setup  -a 'filter=ansible_distribution_version'
-
-Check what version of a given package is in use with this command (using GLIBC
-as an example).
-
-    ansible -i hosts ubuntu -m command -a 'dpkg-query --show libc6'
-
-Check for available updates, and what they contain:
-
-    ansible -i hosts ubuntu -m command -a 'apt-cache policy libc6'
-    ansible -i hosts ubuntu -m command -a 'apt-get changelog libc6' | head -n 20
-
-You can update all the packages with:
-
-    ansible -i hosts ubuntu -m command -a 'apt-get upgrade -y' --sudo
-
-You will then need to restart services. Rebooting the machine is probably
-easiest.
-
-### Check and upgrade Baserock systems
-
-Check what version of a given package is in use with this command (using GLIBC
-as an example). Ideally Baserock reference systems would have a query tool for
-this info, but for now we have to look at the JSON metadata file directly.
-
-    ansible -i hosts baserock -m command \
-        -a "grep '\"\(sha1\|repo\|original_ref\)\":' /baserock/glibc-bins.meta"
-
-The default Baserock machine layout uses Btrfs for the root filesystem. Filling
-up a Btrfs disk results in unpredictable behaviour. Before deploying any system
-upgrades, check that each machine has enough free disk space to hold an
-upgrade. Allow for at least 4GB free space, to be safe.
-
-    ansible -i hosts baserock -m command -a "df -h /"
-
-A good way to free up space is to remove old system-versions using the
-`system-version-manager` tool. There may be other things that are
-unnecessarily taking up space in the root file system, too.
-
-Ideally, at this point you've prepared a patch for definitions.git to fix
-the security issue in the Baserock reference systems, and it has been merged.
-In that case, pull from the reference systems into infrastructure.git, using
-`git pull git://git.baserock.org/baserock/baserock/definitions master`.
-
-If the necessary patch isn't merged in definitions.git, it's still best to
-merge 'master' from there into infrastructure.git, and then cherry-pick the
-patch from Gerrit on top.
-
-You then need to build and upgrade the systems one by one. Do this from the
-'devel-system' machine in the same OpenStack cloud that hosts the
-infrastructure. Baserock upgrades currently involve transferring the whole
-multi-gigabyte system image, so you *must* have a fast connection to the
-target.
-
-Each Baserock system has its own deployment instructions. Each should have
-a deployment .morph file that you can pass to `morph upgrade`. For example,
-to deploy an upgrade git.baserock.org:
-
-    morph upgrade --local-changes=ignore \
-        baserock_trove/baserock_trove.morph gbo.VERSION_LABEL=2016-02-19
-
-Once this completes successfully, rebooting the system should bring up the
-new system. You may want to check that the new `/etc` is correct; you can
-do this inside the machine by mounting `/dev/vda` and looking in `systems/$VERSION_LABEL/run/etc`.
-
-If you want to revert the upgrade, use `system-version-manager list` and
-`system-version-manager set-default <old-version>` to set the previous
-version as the default, then reboot. If the system doesn't boot at all,
-reboot it while you have the graphical console open in Horizon, and you
-should be able to press `ESC` fast enough to get the boot menu open. This
-will allow booting into previous versions of the system. (You shouldn't
-have any problems though since of course we test everything regularly).
-
-Beware of <https://storyboard.baserock.org/#!/story/77>.
-
-For cache.baserock.org, you can reuse the deployment instructions for
-git.baserock.org. Try:
-
-    morph upgrade --local-changes=ignore \
-        baserock_trove/baserock_trove.morph \
-        gbo.update-location=root@cache.baserock.org
-        gbo.VERSION_LABEL=2016-02-19
-
-Deployment to OpenStack
------------------------
-
-The intention is that all of the systems defined here are deployed to an
-OpenStack cloud. The instructions here harcode some details about the specific
-tenancy at [DataCentred](http://www.datacentred.io) that the Baserock project
-uses. It should be easy to adapt them for other OpenStack hosts, though.
-
-### Credentials
-
-The instructions below assume you have the following environment variables set
-according to the OpenStack host you are deploying to:
-
- - `OS_AUTH_URL`
- - `OS_TENANT_NAME`
- - `OS_USERNAME`
- - `OS_PASSWORD`
-
-When using `morph deploy` to deploy to OpenStack, you will need to set these
-variables, because currently Morph does not honour the standard ones. See:
-<https://storyboard.baserock.org/#!/story/35>.
-
- - `OPENSTACK_USER=$OS_USERNAME`
- - `OPENSTACK_PASSWORD=$OS_PASSWORD`
- - `OPENSTACK_TENANT=$OS_TENANT_NAME`
-
-The `location` field in the deployment .morph file will also need to point to
-the correct `$OS_AUTH_URL`.
-
-### Firewall / Security Groups
-
-The instructions assume the presence of a set of security groups. You can
-create these by running the following Ansible playbook.
-
-    ansible-playbook -i hosts firewall.yaml
-
-### Placeholders
-
-The commands below use a couple of placeholders like $network_id, you can set
-them in your environment to allow you to copy and paste the commands below
-as-is.
-
- - `export fedora_image_id=...` (find this with `glance image-list`)
- - `export network_id=...` (find this with `neutron net-list`)
- - `export keyname=...` (find this with `nova keypair-list`)
-
-The `$fedora_image_id` should reference a Fedora Cloud image. You can import
-these from <http://www.fedoraproject.org/>. At time of writing, these
-instructions were tested with Fedora Cloud 23 for x86_64.
-
-Backups
--------
-
-Backups of git.baserock.org's data volume are run by and stored on on a
-Codethink-managed machine named 'access'. They will need to migrate off this
-system before long.  The backups are taken without pausing services or
-snapshotting the data, so they will not be 100% clean. The current
-git.baserock.org data volume does not use LVM and cannot be easily snapshotted.
-
-Backups of 'gerrit' and 'database' are handled by the
-'baserock_backup/backup.py' script. This currently runs on an instance in
-Codethink's internal OpenStack cloud.
-
-Instances themselves are not backed up. In the event of a crisis we will
-redeploy them from the infrastructure.git repository. There should be nothing
-valuable stored outside of the data volumes that are backed up.
-
-To prepare the infrastructure to run the backup scripts you will need to run
-the following playbooks:
-
-    ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml
-    ansible-playbook -i hosts baserock_database/instance-backup-config.yml
-    ansible-playbook -i hosts baserock_gerrit/instance-backup-config.yml
-
-NOTE: to run these playbooks you need to have the public ssh key of the backups
-instance in `keys/backup.key.pub`.
-
-
-Systems
--------
-
-### Front-end
-
-The front-end provides a reverse proxy, to allow more flexible routing than
-simply pointing each subdomain to a different instance using separate public
-IPs. It also provides a starting point for future load-balancing and failover
-configuration.
-
-To deploy this system:
-
-    nova boot frontend-haproxy \
-        --key-name=$keyname \
-        --flavor=dc1.1x0 \
-        --image=$fedora_image_id \
-        --nic="net-id=$network_id" \
-        --security-groups default,gerrit,shared-artifact-cache,web-server \
-        --user-data ./baserock-ops-team.cloud-config
-    ansible-playbook -i hosts baserock_frontend/image-config.yml
-    ansible-playbook -i hosts baserock_frontend/instance-config.yml
-    ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml
-
-    ansible -i hosts -m service -a 'name=haproxy enabled=true state=started' \
-        --sudo frontend-haproxy
-
-The baserock_frontend system is stateless.
-
-Full HAProxy 1.5 documentation: <https://cbonte.github.io/haproxy-dconv/configuration-1.5.html>.
-
-If you want to add a new service to the Baserock Project infrastructure via
-the frontend, do the following:
-
-- request a subdomain that points at 185.43.218.170 (frontend)
-- alter the haproxy.cfg file in the baserock_frontend/ directory in this repo
-  as necessary to proxy requests to the real instance
-- run the baserock_frontend/instance-config.yml playbook
-- run `ansible -i hosts -m service -a 'name=haproxy enabled=true
-  state=restarted' --sudo frontend-haproxy`
-
-OpenStack doesn't provide any kind of internal DNS service, so you must put the
-fixed IP of each instance.
-
-The internal IP address of this machine is hardcoded in some places (beyond the
-usual haproxy.cfg file), use 'git grep' to find all of them. You'll need to
-update all the relevant config files. We really need some internal DNS system
-to avoid this hassle.
-
-### Trove
-
-To deploy to production, run these commands in a Baserock 'devel'
-or 'build' system.
-
-    nova volume-create \
-        --display-name git.baserock.org-home \
-        --display-description '/home partition of git.baserock.org' \
-        --volume-type Ceph \
-        300
-
-    git clone git://git.baserock.org/baserock/baserock/infrastructure.git
-    cd infrastructure
-
-    morph build systems/trove-system-x86_64.morph
-    morph deploy baserock_trove/baserock_trove.morph
-
-    nova boot git.baserock.org \
-        --key-name $keyname \
-        --flavor 'dc1.8x16' \
-        --image baserock_trove \
-        --nic "net-id=$network_id,v4-fixed-ip=192.168.222.58" \
-        --security-groups default,git-server,web-server,shared-artifact-cache \
-        --user-data baserock-ops-team.cloud-config
-
-    nova volume-attach git.baserock.org <volume-id> /dev/vdb
-
-    # Note, if this floating IP is not available, you will have to change
-    # the DNS in the DNS provider.
-    nova add-floating-ip git.baserock.org 185.43.218.183
-
-    ansible-playbook -i hosts baserock_trove/instance-config.yml
-
-    # Before configuring the Trove you will need to create some ssh
-    # keys for it. You can also use existing keys.
-
-    mkdir private
-    ssh-keygen -N '' -f private/lorry.key
-    ssh-keygen -N '' -f private/worker.key
-    ssh-keygen -N '' -f private/admin.key
-
-    # Now you can finish the configuration of the Trove with:
-
-    ansible-playbook -i hosts baserock_trove/configure-trove.yml
-
-### OSTree artifact cache
-
-To deploy this system to production:
-
-    nova volume-create \
-        --display-name ostree-volume \
-        --display-description 'OSTree cache volume' \
-        --volume-type Ceph \
-        300
-
-    nova boot ostree.baserock.org \
-        --key-name $keyname \
-        --flavor dc1.2x8.40 \
-        --image $fedora_image_id \
-        --nic "net-id=$network_id,v4-fixed-ip=192.168.222.153" \
-        --security-groups default,web-server \
-        --user-data ./baserock-ops-team.cloud-config
-
-    nova volume-attach ostree.baserock.org <volume-id> /dev/vdb
-
-    ansible-playbook -i hosts baserock_ostree/image-config.yml
-    ansible-playbook -i hosts baserock_ostree/instance-config.yml
-    ansible-playbook -i hosts baserock_ostree/ostree-access-config.yml
-
-SSL certificates
-================
-
-The certificates used for our infrastructure are provided for free
-by Let's Encrypt. These certificates expire every 3 months. Here we
-will explain how to renew the certificates, and how to deploy them.
-
-Generation of certificates
---------------------------
-
-> Note: This should be automated in the next upgrade. The instructions
-> sound like a lot of effort
-
-To generate the SSL certs, first you need to clone the following repositories:
-
-    git clone https://github.com/lukas2511/letsencrypt.sh.git
-    git clone https://github.com/mythic-beasts/letsencrypt-mythic-dns01.git
-
-The version used the first time was `0.4.0` with sha `116386486b3749e4c5e1b4da35904f30f8b2749b`,
-(just in case future releases break these instructions)
-
-Now inside of the repo, create a `domains.txt` file with the information
-of the subdomains:
-
-    cd letsencrypt.sh
-    cat >domains.txt <<'EOF'
-    baserock.org
-    docs.baserock.org download.baserock.org irclogs.baserock.org ostree.baserock.org paste.baserock.org spec.baserock.org
-    git.baserock.org
-    EOF
-
-And the `config` file needed:
-
-    cat >config <<'EOF'
-    CONTACT_EMAIL="admin@baserock.org"
-    HOOK="../letsencrypt-mythic-dns01/letsencrypt-mythic-dns01.sh"
-    CHALLENGETYPE="dns-01"
-    EOF
-
-Create a `dnsapi.config.txt` with the contents of `private/dnsapi.config.txt`
-decrypted. To show the contents of this file, run the following in a
-`infrastructure.git` repo checkout.
-
-    ansible-vault view private/dnsapi.config.txt
-
-Now, to generate the certs, run:
-
-    ./dehydrated -c
-
-> If this is the first time, you will get asked to run
-> `./dehydrated --register --accept-terms`
-
-In the `certs` folder you will have all the certificates generated. To construct the
-certificates that are present in `certs` and `private` you will have to:
-
-    cd certs
-    mkdir -p tmp/private tmp/certs
-
-    # Create some full certs including key for some services that need it this way
-    cat git.baserock.org/cert.csr git.baserock.org/cert.pem git.baserock.org/chain.pem git.baserock.org/privkey.pem > tmp/private/git-with-key.pem
-    cat docs.baserock.org/cert.csr docs.baserock.org/cert.pem docs.baserock.org/chain.pem docs.baserock.org/privkey.pem > tmp/private/frontend-with-key.pem
-
-    # Copy key files
-    cp git.baserock.org/privkey.pem tmp/private/git.pem
-    cp docs.baserock.org/privkey.pem tmp/private/frontend.pem
-
-    # Copy cert files
-    cp git.baserock.org/cert.csr tmp/certs/git.csr
-    cp git.baserock.org/cert.pem tmp/certs/git.pem
-    cp git.baserock.org/chain.pem tmp/certs/git-chain.pem
-    cp docs.baserock.org/cert.csr tmp/certs/frontend.csr
-    cp docs.baserock.org/cert.pem tmp/certs/frontend.pem
-    cp docs.baserock.org/chain.pem tmp/certs/frontend-chain.pem
-
-    # Create full certs without keys
-    cat git.baserock.org/cert.csr git.baserock.org/cert.pem chain.pem > tmp/certs/git-full.pem
-    cat docs.baserock.org/cert.csr docs.baserock.org/cert.pem chain.pem > tmp/certs/frontend-full.pem
-
-Before replacing the current ones, make sure you **encrypt** the ones that contain
-keys (located in `private` folder):
-
-    ansible-vault encrypt tmp/private/*
-
-And copy them to the repo:
-
-    cp tmp/certs/* ../../certs/
-    cp tmp/private/* ../../private/
-
-
-Deploy certificates
--------------------
-
-For `git.baserock.org` just run:
-
-    ansible-playbook -i hosts baserock_trove/configure-trove.yml
-
-This script will copy the certificates to the Trove and run the scripts
-that will configure them.
-
-For the frontend, run:
-
-    ansible-playbook -i hosts baserock_frontend/instance-config.yml
-    ansible -i hosts -m service -a 'name=haproxy enabled=true state=restarted' --sudo frontend-haproxy
-
-Which will install the certificates and then restart the services needed.
-
-
-GitLab CI runners setup
-=======================
-
-Baserock uses [GitLab CI] for build and test automation. For performance reasons
-we provide our own runners and avoid using the free, shared runners provided by
-GitLab. The runners are hosted at [DigitalOcean] and managed by the 'baserock'
-team account there.
-
-There is a persistent 'manager' machine with a public IP of 138.68.143.2 that
-runs GitLab Runner and [docker-machine]. This doesn't run any builds itself --
-we use the [autoscaling feature] of GitLab Runner to spawn new VMs for building
-in. The configuration for this is in `/etc/gitlab-runner/config.toml`.
-
-Each build occurs in a Docker container on one of the transient VMs. As per
-the [\[runners.docker\] section] of `config.toml`, each gets a newly created
-volume mounted at `/cache`. The YBD and BuildStream cache directories get
-located here because jobs were running out of disk space when using the default
-configuration.
-
-There is a second persistent machine with a public IP of 46.101.48.48 that
-hosts a Docker registry and a [Minio] cache. These services run as Docker
-containers. The Docker registry exists to cache the Docker images we use which
-improves the spin-up time of the transient builder VMs, as documented
-[here](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-docker-registry-mirroring).
-The Minio cache is used for the [distributed caching] feature of GitLab CI.
-
-
-[GitLab CI]: https://about.gitlab.com/features/gitlab-ci-cd/
-[DigitalOcean]: https://cloud.digitalocean.com/
-[docker-machine]: https://docs.docker.com/machine/
-[autoscaling feature]: https://docs.gitlab.com/runner/configuration/autoscale.html
-[Minio]: https://www.minio.io/
-[\[runners.docker\] section]: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section
-[distributed caching]: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching
-- 
cgit v1.2.1