Title: Baserock definitions format The Baserock definitions format =============================== This page describes the Baserock definitions format (morph files). It is intended to be useful as an *informal* specification. It is not guaranteed to be accurate or exhaustive. If you are just getting started with Baserock, the wiki pages [quick-start](http://wiki.baserock.org/quick-start), [devel-with](http://wiki.baserock.org/devel-with) and [guides](http://wiki.baserock.org/guides) pages provide a more practical introduction. The allowed YAML constructs are described in json-schema format here: . The data model is described using OWL here: . The source code of [Morph] and [YBD] might be more useful if you need a completely accurate description of how the current Baserock definition format is used in practice. Versioning ---------- The current version of the definitions format is version 8. Definitions repository ---------------------- The design of Baserock aims to encourage users to keep *all* the information needed to build and deploy their software in one Git repository. This repo is often referred to as the 'definitions.git' repo, although nothing forces you to call it that. Some of this information will be Baserock 'definitions files', which describe how to build or deploy some software. Baserock tooling should expect that all the definitions it needs to process live in one Git repo. The definitions.git repo can and should contain any other files needed for build and deployment as well, such as configuration data and documentation. The Baserock Project maintains a set of 'reference system definitions' at [git://git.baserock.org/baserock/baserock/definitions] (which can also be referred to as [baserock:baserock/definitions], when using the repo aliasing feature of [Morph]). That repo contains systems that can be built and deployed as-is, but it is important that users can fork this repo as well, and work on systems in their version using `git merge` or `git rebase` to keep up to date with changes from upstream. Baserock tooling should not mandate anything about the definitions repo that the user wants to process, other than the rules defined below. [git://git.baserock.org/baserock/baserock/definitions]: http://git.baserock.org/cgit/baserock/baserock/definitions.git [baserock:baserock/definitions]: http://git.baserock.org/cgit/baserock/baserock/definitions.git ### Structure Tooling can enforce that the definitions.git repo is actually a Git repo, but it can equally just treat it as a tree of files and directories. The top directory of the repo must contain a file named `VERSION`, that is valid [YAML] and contains a dict with key "version" and a value that is an integer. The integer specifies the version of the definitions format that this repo uses. A tool should refuse to process a version that it doesn't support, to avoid unpredictable errors. See also the [Versioning](#versioning) section. The top directory of the repo can also contain a file named `DEFAULTS`. This holds repo-wide 'build-system' and 'split-rules' information. See the [defaults](#defaults) section below. To find all the Baserock definition files in the repo, tooling can recursively scan the contents of the repo for files matching the glob pattern "\*.morph". Definitions file syntax ----------------------- [YAML] is used for all Baserock definitions files. The toplevel entity in a definition is a dict, in all cases. Any syntax errors or type errors (such the toplevel entity being a number, or something) should be reported to the user. The [Morph] tool raises an error if any unknown dictionary keys are found in the definition, mainly so that it reports any spelling errors in key names. ### Common fields For all definitions, use the following fields: * `name`: the name of the definition; it must currently match the filename (without the `.morph` suffix); **required** * `kind`: the kind of thing being built; **required** * `description`: a comment to describe what the definition is for; optional Build definitions: Chunks, Systems and Strata --------------------------------------------- Within this document, consider 'building' to be the act of running a series of commands in a given 'environment', where the commands and how to build the environment are completely specified by the definitions and the build tool. ### Chunks A 'chunk' definition describes an individual component, which can be built from a Git repository by executing the given sequence of commands. The structure of a 'chunk' definition is described using [JSON-Schema] in the [spec.git] repo: The build sequence consists of four phases: 1. configure 2. build 3. install 4. strip You can define one or more commands for each phase, or none. Here is an example chunk: name: glibc kind: chunk description: GNU C library (example) configure-commands: - mkdir o - cd o && ../libc/configure --prefix=/usr build-commands: - cd o && make install-commands: - cd o && make install_root="$DESTDIR" install A chunk can optionally make use of a repo-wide build system defined in ['DEFAULTS'](#defaults), by using the `build-system` field. In this case, the command sequences defined in DEFAULTS are used. Any predefined command sequence can be overridden by specifying a new value. You can also extend a predefined build system with the `pre-` or `post-` fields. For example, `pre-configure-commands` are run directly before `configure-commands`, and all `post-configure-commands` are run directly after. For example: name: git kind: chunk description: Git version control tool (example) build-system: autotools # This command will run before the normal 'configure' command sequence. pre-configure-commands: - make configure # This command overrides the normal 'build' command sequence. build-commands: - make all If a chunk doesn't need to override anything from ['DEFAULTS'](#defaults), you can avoid having a chunk .morph file altogether, and just set 'build-system' when referring to the chunk from the stratum. The `max-jobs` field can be used to pass a custom value for --jobs to Make, via the `MAKEFLAGS` environment variable. This is useful for Makefiles which don't work in parallel: you can set `max-jobs: 1` to work around the problem. Parallel jobs are only used during the `build-commands` phase, since the other phases are often not safe when run in parallel; `morph` picks a default value based on the number of CPUs on the host system. Built chunks are split up into multiple artifacts. The default splitting rules for a chunk are defined in ['DEFAULTS'](#defaults). You can use the `chunks` field of a chunk to override these. The `chunks` field is a key/value map of lists of regular expressions; the key is the name of a binary chunk, the regexps match the pathnames that will be included in that chunk; the patterns match the pathnames that get installed by `install-commands` (the whole path below `DESTDIR`); every file must be matched by at least one pattern; by default, a single chunk gets created, named according to the definition, and containing all files. ### Strata A 'stratum' is a group of related chunks. A stratum can contain only chunks. Certain information about how to build a chunk is defined in the containing stratum, rather than in the chunk definition. The structure of a 'stratum' definition is described using [JSON-Schema] in the [spec.git] repo: The fields mean the following: * `build-depends`: a list of strings, each of which refers to another stratum that the current stratum depends on. This list may be omitted or empty if the stratum does not depend on anything else. * `chunks`: a list of key/value mappings, where each mapping corresponds to a chunk to be included in the stratum; the mappings may use the following keys: - `name` is the chunk's name (may be different from the morphology name), - `repo` is the repository in which to find (defaults to chunk name), - `ref` identifies the commit to use (typically a branch name, but any tree-ish git accepts is ok) - `morph` is a path, relative to the top of the definitions repo, to a chunk .morph file. - `build-system` specifies one of the predefined build systems. You must specify ONE of `morph` or `build-system` for each chunk. - `submodules` is a list of key/value mappings that specifies url overrides for .gitmodules. The key should be the name of the submodule as listed in .gitmodules, not the path. The value is a dictionary containing one key/value pair: * `url`: The url override for the submodule, this can include aliasing. In addition to these keys, each of the sources can specify a list of build dependencies using the `build-depends` field. To specify one or more chunk dependencies, `build-depends` needs to be set to a list that contains the names of chunks that the source depends on in the same stratum. These names correspond to the values of the `name` fields of the other chunks. At the moment, the ordering is significant in chunk build-depends. This is used during bootstrapping, when you want to override the first build of a component with its second version in a staging area. This feature is kind of a workaround for the lack of distinction between build and runtime dependencies. ### Systems In the Baserock model, a 'system' is the top level entity that you actually build and execute. Systems contain one or more strata. The structure of a 'system' definition is described using [JSON-Schema] in the [spec.git] repo: The fields mean the following: * `strata`: a list of key/value mappings, similar to the 'chunks' field of a stratum. Two fields are allowed (are both required?): - `name`: name of the artifact when the stratum is build - `morph`: path to a stratum .morph file relative to the top of the containing repo ### Example stratum: name: foundation kind: stratum chunks: - name: fhs-dirs repo: upstream:fhs-dirs ref: baserock/bootstrap build-depends: [] - name: linux-api-headers repo: upstream:linux ref: baserock/morph build-depends: - fhs-dirs - name: eglibc repo: upstream:eglibc ref: baserock/bootstrap build-depends: - linux-api-headers - name: busybox repo: upstream:busybox ref: baserock/bootstrap build-depends: - fhs-dirs - linux-api-headers - name: ansible repo: upstream:ansible ref: v2.0 submodules: lib/ansible/modules/core: url: upstream:ansible-modules-core lib/ansible/modules/extras: url: upstream:ansible-modules-extras ### Example system: name: base kind: system strata: - morph: foundation - morph: linux-stratum Deployment definitions: Clusters -------------------------------- For 'deployment', Baserock defines an API for running 'extensions'. The 'cluster' and 'system' definitions together describe what extensions should be run, and what should be set in their environment, in order to deploy the system. See the [Deployment](deployment) section for how to find and execute the extensions. Within this document, consider "deployment" to be a process of first post-processing a filesystem tree with one or more 'configure extensions', then performing an operation to convert and/or transfer the filesystem tree using a 'write extension'. The structure of the 'cluster' definitions is described using [JSON-Schema] in the [spec.git] repo: A cluster morphology defines a list of systems to deploy, and for each system a list of ways to deploy them. The fields are used as follows: * **systems**: a list of systems to deploy; the value is a list of mappings, where each mapping has the following keys: * **morph**: the system morphology to use in the specified commit. * **deploy**: a mapping where each key identifies a system and each system has at least the following keys: * **type**: identifies the relative path, without extension, to the '.write' program that should be used for this system. * **location**: where the deployed system should end up at. The syntax depends on the '.write' extension chosen in the 'type' field. Optionally, it can specify **upgrade-type** and **upgrade-location** as well, which should be interpreted in the same way. The system dictionary can have any number of other entries. These should be collected up and are passed to each '.configure' extension and to the '.write' extension, through the environment. The extensions can interpret any of them in any manner. * **deploy-defaults**: allows multiple deployments of the same system to share some settings, when they can. Default settings will be overridden by those defined inside the deploy mapping. * **subsystems**: structured in the same way as the 'systems' entry, this allows deploying something *within* a system. The Baserock reference definitions use this to provide an initramfs inside some of the reference systems. Example: name: cluster-foo kind: cluster systems: - morph: devel-system-x86_64-generic.morph deploy: cluster-foo-x86_64-1: type: extensions/kvm location: kvm+ssh://user@host/x86_64-1/x86_64-1.img upgrade-type: extensions/ssh-rsync upgrade-location: root@localhost HOSTNAME: cluster-foo-x86_64-1 DISK_SIZE: 4G RAM_SIZE: 4G VCPUS: 2 - morph: devel-system-armv7-highbank deploy-defaults: type: extensions/pxeboot location: cluster-foo-pxeboot-server deploy: cluster-foo-armv7-1: HOSTNAME: cluster-foo-armv7-1 cluster-foo-armv7-2: HOSTNAME: cluster-foo-armv7-2 Repo URLs --------- Git repository locations can (and should) be abbreviated using the 'repo-alias' feature of Baserock definitions. This is a kind of [Compact URI](http://www.w3.org/TR/2009/CR-curie-20090116/). It currently only affects the 'repo' fields in a stratum .morph file. For example, instead of writing this: ``` - name: fhs-dirs repo: git://git.baserock.org/baserock/baserock/fhs-dirs.git ref: master ``` You can write this: ``` - name: fhs-dirs repo: baserock:baserock/fhs-dirs.git ref: master ``` There are two repo aliases that *must* be defined: - `baserock:` (defaulting to git://git.baserock.org/baserock/) - `upstream:` (defaulting to git://git.baserock.org/delta/) Baserock tools should allow changing these values. The main benefit of this compact URI scheme is that definitions are not tied to a specific Git server, or protocol. You can build against a mirror of the original Git server, or change the protocol that is used, just by altering the repo-alias configuration. Defaults -------- A definitions repository can contain a file named DEFAULTS. This file sets up repo-wide configuration. The structure of the DEFAULTS file is described using [JSON-Schema] in the [spec.git] repo: ### Build systems You can define common command sequences using the 'build-systems' key. These can then be referenced in chunk (and stratum) definitions, to avoid repeating common patterns. Here is an example DEFAULTS file that defines the python-distutils build system: build-systems: python-distutils: configure-commands: [] build-commands: - python setup.py build install-commands: - python setup.py install --prefix "$PREFIX" --root "$DESTDIR" ### Splitting rules One 'source' in Baserock can produce multiple binary 'artifacts'. This allows you to separate programs, libraries, documentation, debugging information, and whatever else into different artifacts. You can then choose to exclude some of these artifacts from your final system. Split rules are defined separately for chunks, and for strata. Chunks define a list of artifacts, and a series of regular expression patterns that are matched against filenames to define what to include. The list of artifacts is evaluated *in order*, so if two artifacts match the same pattern, those files will go into whichever artifact is defined first. Stratum matches work similarly, except the stratum artifacts are a series of patterns that match the *names of chunk artifacts*. Here is an example DEFAULTS file that sets up some simple splitting rules. split-rules: chunk: - artifact: -devel include: - (usr/)?include/.* - (usr/)?share/man/.* - artifact: -runtime include: - .* stratum: - artifact: -devel include: - .*-devel - artifact: -runtime include: - .*-runtime You could use these rules to keep header files (/usr/include/\*) and manual pages (/usr/share/man/\*) out of your systems, by only including the -runtime stratum artifacts. Build environment ----------------- ### Sandboxing Builds should be done an isolated 'staging area', with only the specified dependencies available to the build process. The simplest approach is to install the dependencies in an empty directory, then [chroot](https://en.wikipedia.org/wiki/Chroot) into it. The more sandboxing the build tool can do, the better, because it lowers the chance of unexpected and unreproducible errors in the build process. The [Sandboxlib](https://github.com/CodethinkLabs/sandboxlib) Python library may be useful. The exception to the above is if the 'build-mode' field for a chunk is set to 'bootstrap'. Chunks in bootstrap mode are treated specially and do have access to tools from the host system. FIXME: more detail is needed here! ### Environment variables The following environment variables can be used in chunk configure/build/install commands, and must be defined by the build tool. - `MORPH_ARCH`: the Morph-specific architecture name; see for a list of valid architectures - `PREFIX`: the value of the 'prefix' field for this chunk (set in the stratum .morph file); default /usr - `TARGET`: the [GNU architecture triplet](http://wiki.osdev.org/Target_Triplet) for the target architecture (for example, x86_64-baserock-linux) - `TARGET_STAGE1`: the 'bootstrap' variant of the GNU architecture triplet. This must be different from $TARGET -- you can just change the vendor field to achieve that (e.g. x86_64-bootstrap-linux). FIXME: The `TARGET` and `TARGET_STAGE1` fields are specific to building GNU/Linux based systems, they shouldn't be mandated in the spec. [JSON-Schema]: https://www.json-schema.org/ [Morph]: http://wiki.baserock.org/Morph/ [YBD]: http://wiki.baserock.org/ybd/ [morph.git]: git://git.baserock.org/cgit/baserock/baserock/morph.git/ [spec.git]: git://git.baserock.org/cgit/baserock/baserock/spec.git/ [YAML]: http://yaml.org/