From e609fa50c214bb87b42417cc283f70e28ecefd83 Mon Sep 17 00:00:00 2001 From: Pedro Alvarez Date: Wed, 18 Aug 2021 12:30:26 +0200 Subject: Update README for Terraform and other changes --- README.md | 398 ++++++++++---------------------------------------------------- 1 file changed, 59 insertions(+), 339 deletions(-) diff --git a/README.md b/README.md index fb404f27..3489f30c 100644 --- a/README.md +++ b/README.md @@ -23,14 +23,15 @@ General notes When instantiating a machine that will be public, remember to give shell access everyone on the ops team. This can be done using a post-creation -customisation script that injects all of their SSH keys. The SSH public -keys of the Baserock Operations team are collected in -`baserock-ops-team.cloud-config.`. +customisation script that injects all of their SSH keys. -Ensure SSH password login is disabled in all systems you deploy! See: - for why. The Ansible playbook -`admin/sshd_config.yaml` can ensure that all systems have password login -disabled. +Additionally, ensure SSH password login is disabled in all systems you deploy! +See: for why. + +The Ansible playbook `admin/sshd_config.yaml` can ensure that all systems have +password login disabled, and all the SSH keys installed. + + ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml Administration @@ -44,7 +45,7 @@ To run a playbook: To run an ad-hoc command (upgrading, for example): - ansible -i hosts fedora -m command -a 'sudo dnf update -y' + ansible -i hosts ubuntu -m command -a 'sudo apt -y upgrade' [Ansible]: http://www.ansible.com @@ -52,11 +53,8 @@ To run an ad-hoc command (upgrading, for example): Security updates ---------------- -Fedora security updates can be watched here: -. -The Baserock reference systems doesn't have such a service. The [LWN -Alerts](https://lwn.net/Alerts/) service gives you info from all major Linux -distributions. +The [LWN Alerts](https://lwn.net/Alerts/) service gives you info from all major +Linux distributions. If there is a vulnerability discovered in some software we use, we might need to upgrade all of the systems that use that component at baserock.org. @@ -65,29 +63,6 @@ Bear in mind some systems are not accessible except via the frontend-haproxy system. Those are usually less at risk than those that face the web directly. Also bear in mind we use OpenStack security groups to block most ports. -### Prepare the patch for Baserock systems - -First, you need to update the Baserock reference system definitions with a -fixed version of the component. Build that and test that it works. Submit -the patch to gerrit.baserock.org, get it reviewed, and merged. Then cherry -pick that patch into infrastructure.git. - -This a long-winded process. There are shortcuts you can take, although -someone still has to complete the process described above at some point. - -* You can modify the infrastructure.git definitions directly and start rebuilding - the infrastructure systems right away, to avoid waiting for the Baserock patch - review process. - -* You can add the new version of the component as a stratum that sits above - everything else in the build graph. For example, to do a 'hot-fix' for GLIBC, - add a 'glibc-hotfix' stratum containing the new version to all of the systems - you need to upgrade. Rebuilding them will be quick because you just need to - build GLIBC, and can reuse the cached artifacts for everything else. The new - GLIBC will overwrite the one that is lower down in the build graph in the - resulting filesystem. Of course, if the new version of the component is not - ABI compatible then this approach will break things. Be careful. - ### Check the inventory Make sure the Ansible inventory file is up to date, and that you have access to @@ -120,100 +95,24 @@ every machine you can see in the OpenStack Horizon dashboard has a corresponding entry in the 'hosts' file, to ensure the next steps operate on all of the machines. -### Check and upgrade Fedora systems - -> Bear in mind that only the latest 2 versions of Fedora receive security -updates. If any machines are not running the latest version of Fedora, -you should redeploy them with the latest version. See the instructions below -on how to (re)deploy each machine. You should deploy a new instance of a system -and test it *before* terminating the existing instance. Switching over should -be a matter of changing either its floating IP address or the IP address in -baserock_frontend/haproxy.conf. - -You can find out what version of Fedora is in use with this command: - - ansible fedora -i hosts -m setup -a 'filter=ansible_distribution_version' - -Check what version of a package is in use with this command (using GLIBC as an -example). You can compare this against Fedora package changelogs at -[Koji](https://koji.fedoraproject.org). - - ansible fedora -i hosts -m command -a 'rpm -q glibc --qf "%{VERSION}.%{RELEASE}\n"' - -You can see what updates are available using the `dnf updateinfo info' command. - - ansible -i hosts fedora -m command -a 'dnf updateinfo info glibc' - -You can then use `dnf upgrade -y` to install all available updates. Or give the -name of a package to update just that package. Be aware that DNF is quite slow, -and if you forget to pass `-y` then it will hang forever waiting for input. - -You will then need to restart services. The `dnf needs-restarting` command might be -useful, but rebooting the whole machine is probably easiest. - -### Check and upgrade Baserock systems - -Check what version of a given package is in use with this command (using GLIBC -as an example). Ideally Baserock reference systems would have a query tool for -this info, but for now we have to look at the JSON metadata file directly. - - ansible -i hosts baserock -m command \ - -a "grep '\"\(sha1\|repo\|original_ref\)\":' /baserock/glibc-bins.meta" - -The default Baserock machine layout uses Btrfs for the root filesystem. Filling -up a Btrfs disk results in unpredictable behaviour. Before deploying any system -upgrades, check that each machine has enough free disk space to hold an -upgrade. Allow for at least 4GB free space, to be safe. - - ansible -i hosts baserock -m command -a "df -h /" - -A good way to free up space is to remove old system-versions using the -`system-version-manager` tool. There may be other things that are -unnecessarily taking up space in the root file system, too. +### Check and update Debian/Ubuntu systems -Ideally, at this point you've prepared a patch for definitions.git to fix -the security issue in the Baserock reference systems, and it has been merged. -In that case, pull from the reference systems into infrastructure.git, using -`git pull git://git.baserock.org/baserock/baserock/definitions master`. +Check what version of a package is in use with this command (using NGINX as an +example). -If the necessary patch isn't merged in definitions.git, it's still best to -merge 'master' from there into infrastructure.git, and then cherry-pick the -patch from Gerrit on top. + ansible ubuntu -i hosts -m command -a 'dpkg -s nginx' -You then need to build and upgrade the systems one by one. Do this from the -'devel-system' machine in the same OpenStack cloud that hosts the -infrastructure. Baserock upgrades currently involve transferring the whole -multi-gigabyte system image, so you *must* have a fast connection to the -target. +You can see what updates are available using the `apt-cache policy' command, +which also gives you information about the installed one. -Each Baserock system has its own deployment instructions. Each should have -a deployment .morph file that you can pass to `morph upgrade`. For example, -to deploy an upgrade git.baserock.org: + ansible -i hosts fedora -m command -a 'apt-cache policy nginx' - morph upgrade --local-changes=ignore \ - baserock_trove/baserock_trove.morph gbo.VERSION_LABEL=2016-02-19 +You can then use `apt -y upgrade` to install all available updates. Or use +`apt-get --only-upgrade install ` to update just that package. -Once this completes successfully, rebooting the system should bring up the -new system. You may want to check that the new `/etc` is correct; you can -do this inside the machine by mounting `/dev/vda` and looking in `systems/$VERSION_LABEL/run/etc`. +You will then need to restart services, but rebooting the whole machine is +probably easiest. -If you want to revert the upgrade, use `system-version-manager list` and -`system-version-manager set-default ` to set the previous -version as the default, then reboot. If the system doesn't boot at all, -reboot it while you have the graphical console open in Horizon, and you -should be able to press `ESC` fast enough to get the boot menu open. This -will allow booting into previous versions of the system. (You shouldn't -have any problems though since of course we test everything regularly). - -Beware of . - -For cache.baserock.org, you can reuse the deployment instructions for -git.baserock.org. Try: - - morph upgrade --local-changes=ignore \ - baserock_trove/baserock_trove.morph \ - gbo.update-location=root@cache.baserock.org - gbo.VERSION_LABEL=2016-02-19 Deployment to OpenStack ----------------------- @@ -233,40 +132,8 @@ according to the OpenStack host you are deploying to: - `OS_USERNAME` - `OS_PASSWORD` -For CityCloud you also need to ensure that `OS_REGION_NAME` is set to `Lon1` -(for the London datacentre). - -When using `morph deploy` to deploy to OpenStack, you will need to set these -variables, because currently Morph does not honour the standard ones. See: -. - - - `OPENSTACK_USER=$OS_USERNAME` - - `OPENSTACK_PASSWORD=$OS_PASSWORD` - - `OPENSTACK_TENANT=$OS_TENANT_NAME` - -The `location` field in the deployment .morph file will also need to point to -the correct `$OS_AUTH_URL`. - -### Firewall / Security Groups - -The instructions assume the presence of a set of security groups. You can -create these by running the following Ansible playbook. - - ansible-playbook -i hosts firewall.yaml - -### Placeholders - -The commands below use a couple of placeholders like $network_id, you can set -them in your environment to allow you to copy and paste the commands below -as-is. - - - `export fedora_image_id=...` (find this with `glance image-list`) - - `export network_id=...` (find this with `neutron net-list`) - - `export keyname=...` (find this with `nova keypair-list`) - -The `$fedora_image_id` should reference a Fedora Cloud image. You can import -these from . At time of writing, these -instructions were tested with Fedora Cloud 26 for x86_64. +For CityCloud you also need to ensure that `OS_REGION_NAME` is set to `Fra1` +(for the Frankfurt datacentre). Backups ------- @@ -277,9 +144,31 @@ system before long. The backups are taken without pausing services or snapshotting the data, so they will not be 100% clean. The current git.baserock.org data volume does not use LVM and cannot be easily snapshotted. +> Note: backups currently not running + Systems ------- +All the servers needed are deployed using Terraform. To install all the systems +below, you need to first run Terraform to create all the needed bits in your +service provider + + cd terraform + terraform init + terraform apply + + +This will create/modify a `terraform.tfstate` file containing the status of +of the services in the cloud. It's important to keep it in Git so that +later changes can be applied. Make sure you don't include any secrets in +your Terraform scripts, so that it's safe to publish in the open. + +These scripts will create: + - Networks, subnetworks, floating IPs + - Security groups + - Volumes + - Instances (servers) using all the above + ### Front-end The front-end provides a reverse proxy, to allow more flexible routing than @@ -289,34 +178,24 @@ configuration. To deploy this system: - nova boot frontend-haproxy \ - --key-name=$keyname \ - --flavor=1C-1GB \ - --image=$fedora_image_id \ - --nic="net-id=$network_id" \ - --security-groups default,shared-artifact-cache,web-server \ - --user-data ./baserock-ops-team.cloud-config ansible-playbook -i hosts baserock_frontend/image-config.yml ansible-playbook -i hosts baserock_frontend/instance-config.yml \ --vault-password-file=~/vault-infra-pass - ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml + # backups not being done at the moment + # ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml - ansible -i hosts -m service -a 'name=haproxy enabled=true state=started' \ - --sudo frontend-haproxy The baserock_frontend system is stateless. -Full HAProxy 1.5 documentation: . +Full HAProxy 2.0 documentation: . If you want to add a new service to the Baserock Project infrastructure via the frontend, do the following: -- request a subdomain that points at 37.153.173.19 (frontend) +- request a subdomain that points at the frontend IP - alter the haproxy.cfg file in the baserock_frontend/ directory in this repo as necessary to proxy requests to the real instance - run the baserock_frontend/instance-config.yml playbook -- run `ansible -i hosts -m service -a 'name=haproxy enabled=true - state=restarted' --sudo frontend-haproxy` OpenStack doesn't provide any kind of internal DNS service, so you must put the fixed IP of each instance. @@ -333,21 +212,6 @@ pastebin service. To deploy to production: - openstack volume create \ - --description 'Webserver volume' \ - --size 150 \ - webserver-volume - - nova boot webserver \ - --key-name $keyname \ - --flavor 2C-8GB \ - --image $fedora_image_id \ - --nic "net-id=$network_id" \ - --security-groups default,web-server,haste-server,gitlab-bot \ - --user-data ./baserock-ops-team.cloud-config - - nova volume-attach webserver /dev/vdb - ansible-playbook -i hosts baserock_webserver/image-config.yml ansible-playbook -i hosts baserock_webserver/instance-config.yml ansible-playbook -i hosts baserock_webserver/instance-gitlabirced-config.yml \ @@ -356,176 +220,31 @@ To deploy to production: --vault-password-file ~/vault-infra-pass ansible-playbook -i hosts baserock_webserver/instance-irclogs-config.yml -The webserver machine runs [Cherokee](http://cherokee-project.com/). You -can use the `cherokee-admin` configuration UI, by connecting to the webserver -over SSH and including this in your SSH commandlines: `-L9090:localhost:9090`. -When you run `sudo cherokee-admin` on the server, you'll be able to browse to -it locally on your machine at `https://localhost:9090/`. You also have to -modify the security groups temporarily to allow that port through. - ### Trove -To deploy to production, run these commands in a Baserock 'devel' -or 'build' system. - - nova volume-create \ - --display-name git.baserock.org-home \ - --display-description '/home partition of git.baserock.org' \ - --volume-type Ceph \ - 300 - - git clone git://git.baserock.org/baserock/baserock/infrastructure.git - cd infrastructure - - morph build systems/trove-system-x86_64.morph - morph deploy baserock_trove/baserock_trove.morph - - nova boot git.baserock.org \ - --key-name $keyname \ - --flavor 'dc1.8x16' \ - --image baserock_trove \ - --nic "net-id=$network_id,v4-fixed-ip=192.168.222.58" \ - --security-groups default,git-server,web-server,shared-artifact-cache \ - --user-data baserock-ops-team.cloud-config - - nova volume-attach git.baserock.org /dev/vdb +Deployment of Trove is done using [Lorry Depot]. To do so you can: - # Note, if this floating IP is not available, you will have to change - # the DNS in the DNS provider. - nova add-floating-ip git.baserock.org 37.153.173.36 + git clone https://gitlab.com/CodethinkLabs/lorry/lorry-depot + cd lorry-depot + git clone https://gitlab.com/baserock/git.baserock.org.git + ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml - ansible-playbook -i hosts baserock_trove/instance-config.yml - - # Before configuring the Trove you will need to create some ssh - # keys for it. You can also use existing keys. - - mkdir private - ssh-keygen -N '' -f private/lorry.key - ssh-keygen -N '' -f private/worker.key - ssh-keygen -N '' -f private/admin.key - - # Now you can finish the configuration of the Trove with: - - ansible-playbook -i hosts baserock_trove/configure-trove.yml ### OSTree artifact cache To deploy this system to production: - openstack volume create \ - --description 'OSTree cache volume' \ - --size 300 \ - ostree-volume - - nova boot ostree.baserock.org \ - --key-name $keyname \ - --flavor 2C-8GB \ - --image $fedora_image_id \ - --nic "net-id=$network_id" \ - --security-groups default,web-server \ - --user-data ./baserock-ops-team.cloud-config - - nova volume-attach ostree.baserock.org /dev/vdb - ansible-playbook -i hosts baserock_ostree/image-config.yml ansible-playbook -i hosts baserock_ostree/instance-config.yml ansible-playbook -i hosts baserock_ostree/ostree-access-config.yml + SSL certificates ================ The certificates used for our infrastructure are provided for free -by Let's Encrypt. These certificates expire every 3 months. Here we -will explain how to renew the certificates, and how to deploy them. - -Generation of certificates --------------------------- - -> Note: This should be automated in the next upgrade. The instructions -> sound like a lot of effort - -To generate the SSL certs, first you need to clone the following repositories: - - git clone https://github.com/lukas2511/letsencrypt.sh.git - git clone https://github.com/mythic-beasts/letsencrypt-mythic-dns01.git - # The newest version of the script fails to authenticate, move to a known - # working version. - cd letsencrypt-mythic-dns01 - git checkout 3ce4c7a367f35122acbbf496f498114364f6cfa6 - cd .. - -The version used the first time was `0.4.0` with sha `116386486b3749e4c5e1b4da35904f30f8b2749b`, -(just in case future releases break these instructions) - -Now inside of the repo, create a `domains.txt` file with the information -of the subdomains: - - cd letsencrypt.sh - cat >domains.txt <<'EOF' - *.baserock.org > star_baserock_org - EOF - -And the `config` file needed: - - cat >config <<'EOF' - CONTACT_EMAIL="admin@baserock.org" - HOOK="../letsencrypt-mythic-dns01/letsencrypt-mythic-dns01.sh" - CHALLENGETYPE="dns-01" - EOF - -Create a `dnsapi.config.txt` with the contents of `private/dnsapi.config.txt` -decrypted. To show the contents of this file, run the following in a -`infrastructure.git` repo checkout. - - ansible-vault view ../private/dnsapi.config.txt --ask-vault-pass > dnsapi.config.txt - - -Now, to generate the certs, run: - - ./dehydrated -c - -> If this is the first time, you will get asked to run -> `./dehydrated --register --accept-terms` - -In the `certs` folder you will have all the certificates generated. To construct the -certificates that are present in `certs` and `private` you will have to: - - cd certs - mkdir -p tmp/private tmp/certs - - # Create some full certs including key for some services that need it this way - cat star_baserock_org/cert.csr star_baserock_org/cert.pem star_baserock_org/chain.pem star_baserock_org/privkey.pem > tmp/private/frontend-with-key.pem - - # Copy key files - cp star_baserock_org/privkey.pem tmp/private/frontend.pem - - # Copy cert files - cp star_baserock_org/cert.csr tmp/certs/frontend.csr - cp star_baserock_org/cert.pem tmp/certs/frontend.pem - cp star_baserock_org/chain.pem tmp/certs/frontend-chain.pem - - # Create full certs without keys - cat star_baserock_org/cert.csr star_baserock_org/cert.pem star_baserock_org/chain.pem > tmp/certs/frontend-full.pem - -Before replacing the current ones, make sure you **encrypt** the ones that contain -keys (located in `private` folder): - - ansible-vault encrypt tmp/private/* - -And copy them to the repo: - - cp tmp/certs/* ../../certs/ - cp tmp/private/* ../../private/ - - -Deploy certificates -------------------- - -For the frontend, run: - - ansible-playbook -i hosts baserock_frontend/instance-config.yml - -Which will install the certificates and then restart the services needed. +by Let's Encrypt. These certificates expire every 3 months, but are +automatically updated via certbot. GitLab CI runners setup @@ -562,3 +281,4 @@ The Minio cache is used for the [distributed caching] feature of GitLab CI. [Minio]: https://www.minio.io/ ['runners.docker' section]: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section [distributed caching]: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching +[Lorry Depot]: https://gitlab.com/CodethinkLabs/lorry/lorry-depot -- cgit v1.2.1