summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPedro Alvarez <pedro.alvarez@codethink.co.uk>2021-08-18 12:30:26 +0200
committerPedro Alvarez <pedro.alvarez@codethink.co.uk>2021-09-20 12:40:51 +0100
commite609fa50c214bb87b42417cc283f70e28ecefd83 (patch)
treeb255a5bffcdef075c4d721f09124d03c49a2c70c
parent1e20f06e097f83d78871676711514be7440f7b50 (diff)
downloadinfrastructure-e609fa50c214bb87b42417cc283f70e28ecefd83.tar.gz
Update README for Terraform and other changes
-rw-r--r--README.md398
1 files changed, 59 insertions, 339 deletions
diff --git a/README.md b/README.md
index fb404f27..3489f30c 100644
--- a/README.md
+++ b/README.md
@@ -23,14 +23,15 @@ General notes
When instantiating a machine that will be public, remember to give shell
access everyone on the ops team. This can be done using a post-creation
-customisation script that injects all of their SSH keys. The SSH public
-keys of the Baserock Operations team are collected in
-`baserock-ops-team.cloud-config.`.
+customisation script that injects all of their SSH keys.
-Ensure SSH password login is disabled in all systems you deploy! See:
-<https://testbit.eu/is-ssh-insecure/> for why. The Ansible playbook
-`admin/sshd_config.yaml` can ensure that all systems have password login
-disabled.
+Additionally, ensure SSH password login is disabled in all systems you deploy!
+See: <https://testbit.eu/is-ssh-insecure/> for why.
+
+The Ansible playbook `admin/sshd_config.yaml` can ensure that all systems have
+password login disabled, and all the SSH keys installed.
+
+ ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml
Administration
@@ -44,7 +45,7 @@ To run a playbook:
To run an ad-hoc command (upgrading, for example):
- ansible -i hosts fedora -m command -a 'sudo dnf update -y'
+ ansible -i hosts ubuntu -m command -a 'sudo apt -y upgrade'
[Ansible]: http://www.ansible.com
@@ -52,11 +53,8 @@ To run an ad-hoc command (upgrading, for example):
Security updates
----------------
-Fedora security updates can be watched here:
-<https://bodhi.fedoraproject.org/updates/?type=security>.
-The Baserock reference systems doesn't have such a service. The [LWN
-Alerts](https://lwn.net/Alerts/) service gives you info from all major Linux
-distributions.
+The [LWN Alerts](https://lwn.net/Alerts/) service gives you info from all major
+Linux distributions.
If there is a vulnerability discovered in some software we use, we might need
to upgrade all of the systems that use that component at baserock.org.
@@ -65,29 +63,6 @@ Bear in mind some systems are not accessible except via the frontend-haproxy
system. Those are usually less at risk than those that face the web directly.
Also bear in mind we use OpenStack security groups to block most ports.
-### Prepare the patch for Baserock systems
-
-First, you need to update the Baserock reference system definitions with a
-fixed version of the component. Build that and test that it works. Submit
-the patch to gerrit.baserock.org, get it reviewed, and merged. Then cherry
-pick that patch into infrastructure.git.
-
-This a long-winded process. There are shortcuts you can take, although
-someone still has to complete the process described above at some point.
-
-* You can modify the infrastructure.git definitions directly and start rebuilding
- the infrastructure systems right away, to avoid waiting for the Baserock patch
- review process.
-
-* You can add the new version of the component as a stratum that sits above
- everything else in the build graph. For example, to do a 'hot-fix' for GLIBC,
- add a 'glibc-hotfix' stratum containing the new version to all of the systems
- you need to upgrade. Rebuilding them will be quick because you just need to
- build GLIBC, and can reuse the cached artifacts for everything else. The new
- GLIBC will overwrite the one that is lower down in the build graph in the
- resulting filesystem. Of course, if the new version of the component is not
- ABI compatible then this approach will break things. Be careful.
-
### Check the inventory
Make sure the Ansible inventory file is up to date, and that you have access to
@@ -120,100 +95,24 @@ every machine you can see in the OpenStack Horizon dashboard has a
corresponding entry in the 'hosts' file, to ensure the next steps operate
on all of the machines.
-### Check and upgrade Fedora systems
-
-> Bear in mind that only the latest 2 versions of Fedora receive security
-updates. If any machines are not running the latest version of Fedora,
-you should redeploy them with the latest version. See the instructions below
-on how to (re)deploy each machine. You should deploy a new instance of a system
-and test it *before* terminating the existing instance. Switching over should
-be a matter of changing either its floating IP address or the IP address in
-baserock_frontend/haproxy.conf.
-
-You can find out what version of Fedora is in use with this command:
-
- ansible fedora -i hosts -m setup -a 'filter=ansible_distribution_version'
-
-Check what version of a package is in use with this command (using GLIBC as an
-example). You can compare this against Fedora package changelogs at
-[Koji](https://koji.fedoraproject.org).
-
- ansible fedora -i hosts -m command -a 'rpm -q glibc --qf "%{VERSION}.%{RELEASE}\n"'
-
-You can see what updates are available using the `dnf updateinfo info' command.
-
- ansible -i hosts fedora -m command -a 'dnf updateinfo info glibc'
-
-You can then use `dnf upgrade -y` to install all available updates. Or give the
-name of a package to update just that package. Be aware that DNF is quite slow,
-and if you forget to pass `-y` then it will hang forever waiting for input.
-
-You will then need to restart services. The `dnf needs-restarting` command might be
-useful, but rebooting the whole machine is probably easiest.
-
-### Check and upgrade Baserock systems
-
-Check what version of a given package is in use with this command (using GLIBC
-as an example). Ideally Baserock reference systems would have a query tool for
-this info, but for now we have to look at the JSON metadata file directly.
-
- ansible -i hosts baserock -m command \
- -a "grep '\"\(sha1\|repo\|original_ref\)\":' /baserock/glibc-bins.meta"
-
-The default Baserock machine layout uses Btrfs for the root filesystem. Filling
-up a Btrfs disk results in unpredictable behaviour. Before deploying any system
-upgrades, check that each machine has enough free disk space to hold an
-upgrade. Allow for at least 4GB free space, to be safe.
-
- ansible -i hosts baserock -m command -a "df -h /"
-
-A good way to free up space is to remove old system-versions using the
-`system-version-manager` tool. There may be other things that are
-unnecessarily taking up space in the root file system, too.
+### Check and update Debian/Ubuntu systems
-Ideally, at this point you've prepared a patch for definitions.git to fix
-the security issue in the Baserock reference systems, and it has been merged.
-In that case, pull from the reference systems into infrastructure.git, using
-`git pull git://git.baserock.org/baserock/baserock/definitions master`.
+Check what version of a package is in use with this command (using NGINX as an
+example).
-If the necessary patch isn't merged in definitions.git, it's still best to
-merge 'master' from there into infrastructure.git, and then cherry-pick the
-patch from Gerrit on top.
+ ansible ubuntu -i hosts -m command -a 'dpkg -s nginx'
-You then need to build and upgrade the systems one by one. Do this from the
-'devel-system' machine in the same OpenStack cloud that hosts the
-infrastructure. Baserock upgrades currently involve transferring the whole
-multi-gigabyte system image, so you *must* have a fast connection to the
-target.
+You can see what updates are available using the `apt-cache policy' command,
+which also gives you information about the installed one.
-Each Baserock system has its own deployment instructions. Each should have
-a deployment .morph file that you can pass to `morph upgrade`. For example,
-to deploy an upgrade git.baserock.org:
+ ansible -i hosts fedora -m command -a 'apt-cache policy nginx'
- morph upgrade --local-changes=ignore \
- baserock_trove/baserock_trove.morph gbo.VERSION_LABEL=2016-02-19
+You can then use `apt -y upgrade` to install all available updates. Or use
+`apt-get --only-upgrade install <package name>` to update just that package.
-Once this completes successfully, rebooting the system should bring up the
-new system. You may want to check that the new `/etc` is correct; you can
-do this inside the machine by mounting `/dev/vda` and looking in `systems/$VERSION_LABEL/run/etc`.
+You will then need to restart services, but rebooting the whole machine is
+probably easiest.
-If you want to revert the upgrade, use `system-version-manager list` and
-`system-version-manager set-default <old-version>` to set the previous
-version as the default, then reboot. If the system doesn't boot at all,
-reboot it while you have the graphical console open in Horizon, and you
-should be able to press `ESC` fast enough to get the boot menu open. This
-will allow booting into previous versions of the system. (You shouldn't
-have any problems though since of course we test everything regularly).
-
-Beware of <https://storyboard.baserock.org/#!/story/77>.
-
-For cache.baserock.org, you can reuse the deployment instructions for
-git.baserock.org. Try:
-
- morph upgrade --local-changes=ignore \
- baserock_trove/baserock_trove.morph \
- gbo.update-location=root@cache.baserock.org
- gbo.VERSION_LABEL=2016-02-19
Deployment to OpenStack
-----------------------
@@ -233,40 +132,8 @@ according to the OpenStack host you are deploying to:
- `OS_USERNAME`
- `OS_PASSWORD`
-For CityCloud you also need to ensure that `OS_REGION_NAME` is set to `Lon1`
-(for the London datacentre).
-
-When using `morph deploy` to deploy to OpenStack, you will need to set these
-variables, because currently Morph does not honour the standard ones. See:
-<https://storyboard.baserock.org/#!/story/35>.
-
- - `OPENSTACK_USER=$OS_USERNAME`
- - `OPENSTACK_PASSWORD=$OS_PASSWORD`
- - `OPENSTACK_TENANT=$OS_TENANT_NAME`
-
-The `location` field in the deployment .morph file will also need to point to
-the correct `$OS_AUTH_URL`.
-
-### Firewall / Security Groups
-
-The instructions assume the presence of a set of security groups. You can
-create these by running the following Ansible playbook.
-
- ansible-playbook -i hosts firewall.yaml
-
-### Placeholders
-
-The commands below use a couple of placeholders like $network_id, you can set
-them in your environment to allow you to copy and paste the commands below
-as-is.
-
- - `export fedora_image_id=...` (find this with `glance image-list`)
- - `export network_id=...` (find this with `neutron net-list`)
- - `export keyname=...` (find this with `nova keypair-list`)
-
-The `$fedora_image_id` should reference a Fedora Cloud image. You can import
-these from <http://www.fedoraproject.org/>. At time of writing, these
-instructions were tested with Fedora Cloud 26 for x86_64.
+For CityCloud you also need to ensure that `OS_REGION_NAME` is set to `Fra1`
+(for the Frankfurt datacentre).
Backups
-------
@@ -277,9 +144,31 @@ system before long. The backups are taken without pausing services or
snapshotting the data, so they will not be 100% clean. The current
git.baserock.org data volume does not use LVM and cannot be easily snapshotted.
+> Note: backups currently not running
+
Systems
-------
+All the servers needed are deployed using Terraform. To install all the systems
+below, you need to first run Terraform to create all the needed bits in your
+service provider
+
+ cd terraform
+ terraform init
+ terraform apply
+
+
+This will create/modify a `terraform.tfstate` file containing the status of
+of the services in the cloud. It's important to keep it in Git so that
+later changes can be applied. Make sure you don't include any secrets in
+your Terraform scripts, so that it's safe to publish in the open.
+
+These scripts will create:
+ - Networks, subnetworks, floating IPs
+ - Security groups
+ - Volumes
+ - Instances (servers) using all the above
+
### Front-end
The front-end provides a reverse proxy, to allow more flexible routing than
@@ -289,34 +178,24 @@ configuration.
To deploy this system:
- nova boot frontend-haproxy \
- --key-name=$keyname \
- --flavor=1C-1GB \
- --image=$fedora_image_id \
- --nic="net-id=$network_id" \
- --security-groups default,shared-artifact-cache,web-server \
- --user-data ./baserock-ops-team.cloud-config
ansible-playbook -i hosts baserock_frontend/image-config.yml
ansible-playbook -i hosts baserock_frontend/instance-config.yml \
--vault-password-file=~/vault-infra-pass
- ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml
+ # backups not being done at the moment
+ # ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml
- ansible -i hosts -m service -a 'name=haproxy enabled=true state=started' \
- --sudo frontend-haproxy
The baserock_frontend system is stateless.
-Full HAProxy 1.5 documentation: <https://cbonte.github.io/haproxy-dconv/configuration-1.5.html>.
+Full HAProxy 2.0 documentation: <https://cbonte.github.io/haproxy-dconv/2.0/configuration.html>.
If you want to add a new service to the Baserock Project infrastructure via
the frontend, do the following:
-- request a subdomain that points at 37.153.173.19 (frontend)
+- request a subdomain that points at the frontend IP
- alter the haproxy.cfg file in the baserock_frontend/ directory in this repo
as necessary to proxy requests to the real instance
- run the baserock_frontend/instance-config.yml playbook
-- run `ansible -i hosts -m service -a 'name=haproxy enabled=true
- state=restarted' --sudo frontend-haproxy`
OpenStack doesn't provide any kind of internal DNS service, so you must put the
fixed IP of each instance.
@@ -333,21 +212,6 @@ pastebin service.
To deploy to production:
- openstack volume create \
- --description 'Webserver volume' \
- --size 150 \
- webserver-volume
-
- nova boot webserver \
- --key-name $keyname \
- --flavor 2C-8GB \
- --image $fedora_image_id \
- --nic "net-id=$network_id" \
- --security-groups default,web-server,haste-server,gitlab-bot \
- --user-data ./baserock-ops-team.cloud-config
-
- nova volume-attach webserver <volume-id> /dev/vdb
-
ansible-playbook -i hosts baserock_webserver/image-config.yml
ansible-playbook -i hosts baserock_webserver/instance-config.yml
ansible-playbook -i hosts baserock_webserver/instance-gitlabirced-config.yml \
@@ -356,176 +220,31 @@ To deploy to production:
--vault-password-file ~/vault-infra-pass
ansible-playbook -i hosts baserock_webserver/instance-irclogs-config.yml
-The webserver machine runs [Cherokee](http://cherokee-project.com/). You
-can use the `cherokee-admin` configuration UI, by connecting to the webserver
-over SSH and including this in your SSH commandlines: `-L9090:localhost:9090`.
-When you run `sudo cherokee-admin` on the server, you'll be able to browse to
-it locally on your machine at `https://localhost:9090/`. You also have to
-modify the security groups temporarily to allow that port through.
-
### Trove
-To deploy to production, run these commands in a Baserock 'devel'
-or 'build' system.
-
- nova volume-create \
- --display-name git.baserock.org-home \
- --display-description '/home partition of git.baserock.org' \
- --volume-type Ceph \
- 300
-
- git clone git://git.baserock.org/baserock/baserock/infrastructure.git
- cd infrastructure
-
- morph build systems/trove-system-x86_64.morph
- morph deploy baserock_trove/baserock_trove.morph
-
- nova boot git.baserock.org \
- --key-name $keyname \
- --flavor 'dc1.8x16' \
- --image baserock_trove \
- --nic "net-id=$network_id,v4-fixed-ip=192.168.222.58" \
- --security-groups default,git-server,web-server,shared-artifact-cache \
- --user-data baserock-ops-team.cloud-config
-
- nova volume-attach git.baserock.org <volume-id> /dev/vdb
+Deployment of Trove is done using [Lorry Depot]. To do so you can:
- # Note, if this floating IP is not available, you will have to change
- # the DNS in the DNS provider.
- nova add-floating-ip git.baserock.org 37.153.173.36
+ git clone https://gitlab.com/CodethinkLabs/lorry/lorry-depot
+ cd lorry-depot
+ git clone https://gitlab.com/baserock/git.baserock.org.git
+ ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml
- ansible-playbook -i hosts baserock_trove/instance-config.yml
-
- # Before configuring the Trove you will need to create some ssh
- # keys for it. You can also use existing keys.
-
- mkdir private
- ssh-keygen -N '' -f private/lorry.key
- ssh-keygen -N '' -f private/worker.key
- ssh-keygen -N '' -f private/admin.key
-
- # Now you can finish the configuration of the Trove with:
-
- ansible-playbook -i hosts baserock_trove/configure-trove.yml
### OSTree artifact cache
To deploy this system to production:
- openstack volume create \
- --description 'OSTree cache volume' \
- --size 300 \
- ostree-volume
-
- nova boot ostree.baserock.org \
- --key-name $keyname \
- --flavor 2C-8GB \
- --image $fedora_image_id \
- --nic "net-id=$network_id" \
- --security-groups default,web-server \
- --user-data ./baserock-ops-team.cloud-config
-
- nova volume-attach ostree.baserock.org <volume-id> /dev/vdb
-
ansible-playbook -i hosts baserock_ostree/image-config.yml
ansible-playbook -i hosts baserock_ostree/instance-config.yml
ansible-playbook -i hosts baserock_ostree/ostree-access-config.yml
+
SSL certificates
================
The certificates used for our infrastructure are provided for free
-by Let's Encrypt. These certificates expire every 3 months. Here we
-will explain how to renew the certificates, and how to deploy them.
-
-Generation of certificates
---------------------------
-
-> Note: This should be automated in the next upgrade. The instructions
-> sound like a lot of effort
-
-To generate the SSL certs, first you need to clone the following repositories:
-
- git clone https://github.com/lukas2511/letsencrypt.sh.git
- git clone https://github.com/mythic-beasts/letsencrypt-mythic-dns01.git
- # The newest version of the script fails to authenticate, move to a known
- # working version.
- cd letsencrypt-mythic-dns01
- git checkout 3ce4c7a367f35122acbbf496f498114364f6cfa6
- cd ..
-
-The version used the first time was `0.4.0` with sha `116386486b3749e4c5e1b4da35904f30f8b2749b`,
-(just in case future releases break these instructions)
-
-Now inside of the repo, create a `domains.txt` file with the information
-of the subdomains:
-
- cd letsencrypt.sh
- cat >domains.txt <<'EOF'
- *.baserock.org > star_baserock_org
- EOF
-
-And the `config` file needed:
-
- cat >config <<'EOF'
- CONTACT_EMAIL="admin@baserock.org"
- HOOK="../letsencrypt-mythic-dns01/letsencrypt-mythic-dns01.sh"
- CHALLENGETYPE="dns-01"
- EOF
-
-Create a `dnsapi.config.txt` with the contents of `private/dnsapi.config.txt`
-decrypted. To show the contents of this file, run the following in a
-`infrastructure.git` repo checkout.
-
- ansible-vault view ../private/dnsapi.config.txt --ask-vault-pass > dnsapi.config.txt
-
-
-Now, to generate the certs, run:
-
- ./dehydrated -c
-
-> If this is the first time, you will get asked to run
-> `./dehydrated --register --accept-terms`
-
-In the `certs` folder you will have all the certificates generated. To construct the
-certificates that are present in `certs` and `private` you will have to:
-
- cd certs
- mkdir -p tmp/private tmp/certs
-
- # Create some full certs including key for some services that need it this way
- cat star_baserock_org/cert.csr star_baserock_org/cert.pem star_baserock_org/chain.pem star_baserock_org/privkey.pem > tmp/private/frontend-with-key.pem
-
- # Copy key files
- cp star_baserock_org/privkey.pem tmp/private/frontend.pem
-
- # Copy cert files
- cp star_baserock_org/cert.csr tmp/certs/frontend.csr
- cp star_baserock_org/cert.pem tmp/certs/frontend.pem
- cp star_baserock_org/chain.pem tmp/certs/frontend-chain.pem
-
- # Create full certs without keys
- cat star_baserock_org/cert.csr star_baserock_org/cert.pem star_baserock_org/chain.pem > tmp/certs/frontend-full.pem
-
-Before replacing the current ones, make sure you **encrypt** the ones that contain
-keys (located in `private` folder):
-
- ansible-vault encrypt tmp/private/*
-
-And copy them to the repo:
-
- cp tmp/certs/* ../../certs/
- cp tmp/private/* ../../private/
-
-
-Deploy certificates
--------------------
-
-For the frontend, run:
-
- ansible-playbook -i hosts baserock_frontend/instance-config.yml
-
-Which will install the certificates and then restart the services needed.
+by Let's Encrypt. These certificates expire every 3 months, but are
+automatically updated via certbot.
GitLab CI runners setup
@@ -562,3 +281,4 @@ The Minio cache is used for the [distributed caching] feature of GitLab CI.
[Minio]: https://www.minio.io/
['runners.docker' section]: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section
[distributed caching]: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching
+[Lorry Depot]: https://gitlab.com/CodethinkLabs/lorry/lorry-depot