Baserock project public infrastructure ====================================== This repository contains the definitions for all of the Baserock Project's infrastructure. This includes every service used by the project, except for the mailing lists (hosted by [Pepperfish]) the wiki (hosted by [Branchable]) and the GitLab CI runners (set up by Javier Jardón). Some of these systems are Baserock systems. This has proved an obstacle to keeping them up to date with security updates, and we plan to switch everything to run on mainstream distros in future. All files necessary for (re)deploying the systems should be contained in this Git repository. Private tokens should be encrypted using [ansible-vault](https://www.ansible.com/blog/2014/02/19/ansible-vault). [Pepperfish]: http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo [Branchable]: http://www.branchable.com/ General notes ------------- When instantiating a machine that will be public, remember to give shell access everyone on the ops team. This can be done using a post-creation customisation script that injects all of their SSH keys. Additionally, ensure SSH password login is disabled in all systems you deploy! See: for why. The Ansible playbook `admin/sshd_config.yaml` can ensure that all systems have password login disabled, and all the SSH keys installed. ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml Administration -------------- You can use [Ansible] to automate tasks on the baserock.org systems. To run a playbook: ansible-playbook -i hosts $PLAYBOOK.yaml To run an ad-hoc command (upgrading, for example): ansible -i hosts ubuntu -m command -a 'sudo apt -y upgrade' [Ansible]: http://www.ansible.com Security updates ---------------- The [LWN Alerts](https://lwn.net/Alerts/) service gives you info from all major Linux distributions. If there is a vulnerability discovered in some software we use, we might need to upgrade all of the systems that use that component at baserock.org. Bear in mind some systems are not accessible except via the frontend-haproxy system. Those are usually less at risk than those that face the web directly. Also bear in mind we use OpenStack security groups to block most ports. ### Check the inventory Make sure the Ansible inventory file is up to date, and that you have access to all machines. Run this: ansible \* -i ./hosts -m ping You should see lots of this sort of output: frontend-haproxy | success >> { "changed": false, "ping": "pong" } You may find some host key errors like this: paste | FAILED => SSH Error: Host key verification failed. It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue. If you have a host key problem, that could be because somebody redeployed the system since the last time you connected to it with SSH, and did not transfer the SSH host keys from the old system to the new system. Check with other ops teams members about this. If you are sure the new host keys can be trusted, you can remove the old ones with `ssh-keygen -R 10.3.x.y`, where 10.3.x.y is the internal IP address of the machine. You'll then be prompted to accept the new ones when you run Ansible again. Once all machines respond to the Ansible 'ping' module, double check that every machine you can see in the OpenStack Horizon dashboard has a corresponding entry in the 'hosts' file, to ensure the next steps operate on all of the machines. ### Check and update Debian/Ubuntu systems Check what version of a package is in use with this command (using NGINX as an example). ansible ubuntu -i hosts -m command -a 'dpkg -s nginx' You can see what updates are available using the `apt-cache policy' command, which also gives you information about the installed one. ansible -i hosts fedora -m command -a 'apt-cache policy nginx' You can then use `apt -y upgrade` to install all available updates. Or use `apt-get --only-upgrade install ` to update just that package. You will then need to restart services, but rebooting the whole machine is probably easiest. Deployment to OpenStack ----------------------- The intention is that all of the systems defined here are deployed to an OpenStack cloud. The instructions here harcode some details about the specific tenancy at [CityCloud](https://citycontrolpanel.com/) that the Baserock project uses. It should be easy to adapt them for other OpenStack hosts, though. ### Credentials The instructions below assume you have the following environment variables set according to the OpenStack host you are deploying to: - `OS_AUTH_URL` - `OS_TENANT_NAME` - `OS_USERNAME` - `OS_PASSWORD` For CityCloud you also need to ensure that `OS_REGION_NAME` is set to `Fra1` (for the Frankfurt datacentre). Backups ------- Backups of git.baserock.org's data volume are run by and stored on on a Codethink-managed machine named 'access'. They will need to migrate off this system before long. The backups are taken without pausing services or snapshotting the data, so they will not be 100% clean. The current git.baserock.org data volume does not use LVM and cannot be easily snapshotted. > Note: backups currently not running Systems ------- All the servers needed are deployed using Terraform. To install all the systems below, you need to first run Terraform to create all the needed bits in your service provider cd terraform terraform init terraform apply This will create/update the tfstate currently stored in OpenStack (via Swift). If you want to download the state file you can run the following command, but this isn't necessary: openstack object save terraform-state-baserock tfstate.tf > The `tfstate` is common for everyone. Is not recommended to have multiple > people working at the same time on the Terraform side of the infrastructure. These scripts will create: - Networks, subnetworks, floating IPs - Security groups - Volumes - Instances (servers) using all the above ### Front-end The front-end provides a reverse proxy, to allow more flexible routing than simply pointing each subdomain to a different instance using separate public IPs. It also provides a starting point for future load-balancing and failover configuration. To deploy this system: ansible-playbook -i hosts baserock_frontend/image-config.yml ansible-playbook -i hosts baserock_frontend/instance-config.yml \ --vault-password-file=~/vault-infra-pass # backups not being done at the moment # ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml The baserock_frontend system is stateless. Full HAProxy 2.0 documentation: . If you want to add a new service to the Baserock Project infrastructure via the frontend, do the following: - request a subdomain that points at the frontend IP - alter the haproxy.cfg file in the baserock_frontend/ directory in this repo as necessary to proxy requests to the real instance - run the baserock_frontend/instance-config.yml playbook OpenStack doesn't provide any kind of internal DNS service, so you must put the fixed IP of each instance. The internal IP address of this machine is hardcoded in some places (beyond the usual haproxy.cfg file), use 'git grep' to find all of them. You'll need to update all the relevant config files. We really need some internal DNS system to avoid this hassle. ### General webserver The general-purpose webserver provides downloads, plus IRC logging and a pastebin service. To deploy to production: ansible-playbook -i hosts baserock_webserver/image-config.yml ansible-playbook -i hosts baserock_webserver/instance-config.yml ansible-playbook -i hosts baserock_webserver/instance-gitlabirced-config.yml \ --vault-password-file ~/vault-infra-pass ansible-playbook -i hosts baserock_webserver/instance-hastebin-config.yml \ --vault-password-file ~/vault-infra-pass ansible-playbook -i hosts baserock_webserver/instance-irclogs-config.yml ### Trove Deployment of Trove is done using [Lorry Depot]. To do so you can: git clone https://gitlab.com/CodethinkLabs/lorry/lorry-depot cd lorry-depot git clone https://gitlab.com/baserock/git.baserock.org.git ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml ### OSTree artifact cache To deploy this system to production: ansible-playbook -i hosts baserock_ostree/image-config.yml ansible-playbook -i hosts baserock_ostree/instance-config.yml ansible-playbook -i hosts baserock_ostree/ostree-access-config.yml SSL certificates ================ The certificates used for our infrastructure are provided for free by Let's Encrypt. These certificates expire every 3 months, but are automatically updated via certbot. GitLab CI runners setup ======================= Baserock uses [GitLab CI] for build and test automation. For performance reasons we provide our own runners and avoid using the free, shared runners provided by GitLab. The runners are hosted at [DigitalOcean] and managed by the 'baserock' team account there. There is a persistent 'manager' machine with a public IP of 138.68.150.249 that runs GitLab Runner and [docker-machine]. This doesn't run any builds itself -- we use the [autoscaling feature] of GitLab Runner to spawn new VMs for building in. The configuration for this is in `/etc/gitlab-runner/config.toml`. Each build occurs in a Docker container on one of the transient VMs. As per the ['runners.docker' section] of `config.toml`, each gets a newly created volume mounted at `/cache`. The YBD and BuildStream cache directories get located here because jobs were running out of disk space when using the default configuration. There is a second persistent machine with a public IP of 46.101.48.48 that hosts a Docker registry and a [Minio] cache. These services run as Docker containers. The Docker registry exists to cache the Docker images we use which improves the spin-up time of the transient builder VMs, as documented [here](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-docker-registry-mirroring). The Minio cache is used for the [distributed caching] feature of GitLab CI. [GitLab CI]: https://about.gitlab.com/features/gitlab-ci-cd/ [DigitalOcean]: https://cloud.digitalocean.com/ [docker-machine]: https://docs.docker.com/machine/ [autoscaling feature]: https://docs.gitlab.com/runner/configuration/autoscale.html [Minio]: https://www.minio.io/ ['runners.docker' section]: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section [distributed caching]: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching [Lorry Depot]: https://gitlab.com/CodethinkLabs/lorry/lorry-depot