summaryrefslogtreecommitdiff
path: root/spec.mdwn
blob: 03fde9e060da4571a1efda414f2663c379995668 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
[[!meta title="Baserock definitions format"]]

The Baserock definitions format
===============================

This page describes the Baserock definitions format (morph files). It is intended to be useful as
an *informal* specification. It is not guaranteed to be accurate or exhaustive.

If you are just getting started with Baserock, the [[quick-start]], [[devel-with]] and [[guides]] pages provide a more practical introduction.

The allowed YAML constructs are described in json-schema format here: <http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git/tree/schemas>.

The data model is described using OWL here: <http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git/tree/schemas/baserock.owl>.

The source code of [[Morph]] and [[YBD]] might be more useful if you need a completely accurate description of how the current Baserock definition format is used in practice.

[[!toc startlevel=2 levels=2]]

Versioning
----------

The current version of the definitions format is version 7.

See also: the [[planned]] versions, and the [[historical]] versions of the
format.

Please propose changes to the format described here on the
[[baserock-dev@baserock.org mailing list|mailinglist]]. Ideally, provide a
patch for this file against the <git://baserock.branchable.com/> repo, but just
describing the change you want to make is also fine.

Definitions repository
----------------------

The design of Baserock aims to encourage users to keep *all* the information
needed to build and deploy their software in one Git repository. This repo is
often referred to as the 'definitions.git' repo, although nothing forces you
to call it that.

Some of this information will be Baserock 'definitions files', which describe
how to build or deploy some software. Baserock tooling should expect that all
the definitions it needs to process live in one Git repo. The definitions.git
repo can and should contain any other files needed for build and deployment as
well, such as configuration data and documentation.

The Baserock Project maintains a set of 'reference system definitions' at
[git://git.baserock.org/baserock/baserock/definitions] (which can also be
referred to as [baserock:baserock/definitions], when using the repo aliasing
feature of [[Morph]]). That repo contains systems that can be built and
deployed as-is, but it is important that users can fork this repo as well,
and work on systems in their version using `git merge` or `git rebase` to keep
up to date with changes from upstream.

Baserock tooling should not mandate anything about the definitions repo that
the user wants to process, other than the rules defined below.


[git://git.baserock.org/baserock/baserock/definitions]: http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git
[baserock:baserock/definitions]: http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/definitions.git

### Structure

Tooling can enforce that the definitions.git repo is actually a Git repo, but
it can equally just treat it as a tree of files and directories.

The top directory of the repo must contain a file named `VERSION`, that is
valid [YAML] and contains a dict with key "version" and a value that is an
integer.

The integer specifies the version of the definitions format that this repo
uses. A tool should refuse to process a version that it doesn't support, to
avoid unpredictable errors. See the "Versioning" heading above for more detail
on versions.

To find all the Baserock definition files in the repo, tooling can recursively
scan the contents of the repo for files matching the glob pattern "\*.morph".

Deployment tooling should look in the toplevel directory /only/ for files
matching the following globs. (The purpose of these files is described in the
"deployment" section below).

  - `\*.check`
  - `\*.check.help`
  - `\*.configure`
  - `\*.configure.help`
  - `\*.write`
  - `\*.write.help`

Definitions file syntax
-----------------------

[YAML] is used for all Baserock definitions files.

The toplevel entity in a definition is a dict, in all cases. Any syntax errors
or type errors (such the toplevel entity being a number, or something) should
be reported to the user.

The [[Morph]] tool raises an error if any unknown dictionary keys are found in
the definition, mainly so that it reports any spelling errors in key names.

### Common fields

For all definitions, use the following fields:

* `name`: the name of the definition; it must currently match the filename
  (without the `.morph` suffix); **required**
* `kind`: the kind of thing being built; **required**
* `description`: a comment to describe what the definition is for; optional

### Build definitions: Chunks, Systems and Strata

Within this document, consider 'building' to be the act of running a series of
commands in a given 'environment', where the commands and how to build the
environment are completely specified by the definitions and the build tool.

For chunks, use the following fields:

* `build-system`: if the program is built using a build system known to
  `morph`, you can set this field and avoid having to set the various
  `*-commands` fields; the commands that the build system specifies can
  be overridden; the following build-systems are known:

  - `autotools`
  - `python-distutils`
  - `cpan`
  - `cmake`
  - `qmake`

  optional

* `pre-configure-commands`: a list of shell commands to run at
  the configuration phase of a build, before the list in `configure-commands`;
  optional
* `configure-commands`: a list of shell commands to run at the configuraiton
  phase of a build; optional
* `post-configure-commands`: a list of shell commands to run at
  the configuration phase of a build, after the list in `configure-commands`;
  optional

* `pre-build-commands`: a list of shell commands to run at
  the build phase of a build, before the list in `build-commands`;
  optional
* `build-commands`: a list of shell commands to run to build (compile) the
  project; optional
* `post-build-commands`: a list of shell commands to run at
  the build phase of a build, after the list in `build-commands`;
  optional

* `pre-test-commands`: a list of shell commands to run at
  the test phase of a build, before the list in `test-commands`;
  optional
* `test-commands`: a list of shell commands to run unit tests and other
  non-interactive tests on the built but un-installed project; optional
* `post-test-commands`: a list of shell commands to run at
  the test phase of a build, after the list in `test-commands`;
  optional

* `pre-install-commands`: a list of shell commands to run at
  the install phase of a build, before the list in `install-commands`;
  optional
* `install-commands`: a list of shell commands to ; optional
* `post-install-commands`: a list of shell commands to run at
  the install phase of a build, after the list in `strip-commands`;
  optional

* `pre-strip-commands`: a list of shell commands to run at
  the strip phase of a build, before the list in `strip-commands`;
  optional
* `strip-commands`: a list of shell commands to strip debug symbols from binaries;
  this should strip binaries in the directory named in the `DESTDIR` environment
  variable, not the actual system; optional
* `post-strip-commands`: a list of shell commands to run at
  the strip phase of a build, after the list in `install-commands`;
  optional

* `max-jobs`: a string to be given to `make` as the argument to the `-j`
  option to specify the maximum number of parallel jobs; the only sensible
  value is `"1"` (including the quotes), to prevent parallel jobs to run
  at all; parallel jobs are only used during the `build-commands` phase,
  since the other phases are often not safe when run in parallel; `morph`
  picks a default value based on the number of CPUs on the host system;
  optional

* `chunks`: a key/value map of lists of regular expressions;
  the key is the name
  of a binary chunk, the regexps match the pathnames that will be
  included in that chunk; the patterns match the pathnames that get installed
  by `install-commands` (the whole path below `DESTDIR`); every file must
  be matched by at least one pattern; by default, a single chunk gets
  created, named according to the definition, and containing all files;
  optional

For strata, use the following fields:

* `build-depends`: a list of strings, each of which refers to another
  stratum that the current stratum depends on. This list may be omitted
  or empty if the stratum does not depend on anything else.
* `chunks`: a list of key/value mappings, where each mapping corresponds
  to a chunk to be included in the stratum; the mappings may use the
  following keys:
    - `name` is the chunk's name (may be different from the
      morphology name),
    - `repo` is the repository in which to find (defaults to chunk name),
    - `ref` identifies the commit to use (typically a branch name, but
       any tree-ish git accepts is ok)
    - `morph` is a path, relative to the top of the definitions repo,
      to a chunk .morph file.
    - `build-system` specifies one of the predefined build systems. You
      must specify ONE of `morph` or `build-system` for each chunk.
  In addition to these keys, each of the sources can specify a list of
  build dependencies using the `build-depends` field. To specify one or
  more chunk dependencies, `build-depends` needs to be set to a list
  that contains the names of chunks that the source depends on in the
  same stratum. These names correspond to the values of the `name`
  fields of the other chunks.

  At the moment, the ordering is significant in chunk build-depends. This
  is used during bootstrapping, when you want to override the first build of
  a component with its second version in a staging area. This feature is kind
  of a workaround for the lack of distinction between build and runtime
  dependencies.

For systems, use the following fields:

* `strata`: a list of key/value mappings, similar to the 'chunks' field of a
  stratum. Two fields are allowed (are both required?):
    - `name`: name of the artifact when the stratum is build
    - `morph`: path to a stratum .morph file relative to the top of the containing repo

Example chunk (simplified commands):

    name: eglibc
    kind: chunk
    configure-commands:
    - mkdir o
    - cd o && ../libc/configure --prefix=/usr
    build-commands:
    - cd o && make
    install-commands:
    - cd o && make install_root="$DESTDIR" install

Example stratum:

    name: foundation
    kind: stratum
    chunks:
    - name: fhs-dirs
      repo: upstream:fhs-dirs
      ref: baserock/bootstrap
      build-depends: []
    - name: linux-api-headers
      repo: upstream:linux
      ref: baserock/morph
      build-depends:
      - fhs-dirs
    - name: eglibc
      repo: upstream:eglibc
      ref: baserock/bootstrap
      build-depends:
      - linux-api-headers
    - name: busybox
      repo: upstream:busybox
      ref: baserock/bootstrap
      build-depends:
      - fhs-dirs
      - linux-api-headers

Example system:

    name: base
    kind: system
    strata:
    - morph: foundation
    - morph: linux-stratum

### Deployment definitions: Clusters

**NOTE**: The deployment mechanism specified here is quite abstract. Most of
the code used to do real-world deployments is currently tied to [[Morph]] and
kept in [morph.git].

For 'deployment', [[Morph]] defines an API for running 'extensions'. The
'cluster' and 'system' definitions together describe what extensions should be
run, and what should be set in their environment, in order to deploy the
system. See the "Deployment extension API" section below for how to find and
execute the extensions.

Within this document, consider "deployment" to be a process of first
post-processing a filesystem tree with one or more 'configure extensions', then 
performing an operation to convert and/or transfer the filesystem tree
using a 'write extension'.

A cluster morphology defines a list of systems to deploy, and for each system a
list of ways to deploy them. Use the following field:

* **systems**: a list of systems to deploy;
    the value is a list of mappings, where each mapping has the
    following keys:

    * **morph**: the system morphology to use in the specified
        commit.

    * **deploy**: a mapping where each key identifies a
        system and each system has at least the following keys:

        * **type**: identifies the type of deployment e.g. (kvm,
            pxeboot), and thus what '.write' extension should be used.
        * **location**: where the deployed system should end up
            at. The syntax depends on the '.write' extension chosen in the
            'type' field.

        Optionally, it can specify **upgrade-type** and
        **upgrade-location** as well, which should be interpreted in the same
        way.

        The system dictionary can have any number of other entries. These
        should be collected up and are passed to each '.configure' extension
        and to the '.write' extension, through the environment. The extensions
        can interpret any of them in any manner.

    * **deploy-defaults**: allows multiple deployments of the same
        system to share some settings, when they can. Default settings
        will be overridden by those defined inside the deploy mapping.

    * **subsystems**: structured in the same way as the 'systems' entry, this
        allows deploying something *within* a system. The Baserock reference
        definitions use this to provide an initramfs inside some of the
        reference systems.

Example:

    name: cluster-foo
    kind: cluster
    systems:
        - morph: devel-system-x86_64-generic.morph
            deploy:
                cluster-foo-x86_64-1:
                    type: extensions/kvm
                    location: kvm+ssh://user@host/x86_64-1/x86_64-1.img
                    upgrade-type: extensions/ssh-rsync
                    upgrade-location: root@localhost
                    HOSTNAME: cluster-foo-x86_64-1
                    DISK_SIZE: 4G
                    RAM_SIZE: 4G
                    VCPUS: 2
        - morph: devel-system-armv7-highbank
            deploy-defaults:
                type: extensions/pxeboot
                location: cluster-foo-pxeboot-server
            deploy:
                cluster-foo-armv7-1:
                    HOSTNAME: cluster-foo-armv7-1
                cluster-foo-armv7-2:
                    HOSTNAME: cluster-foo-armv7-2

### Repo URLs

Git repository locations can (and should) be abbreviated using the 'repo-alias' feature of Baserock definitions. This is a kind of [Compact URI](http://www.w3.org/TR/2009/CR-curie-20090116/). It currently only affects the 'repo' fields in a stratum .morph file.

For example, instead of writing this:

```
- name: fhs-dirs
  repo: git://git.baserock.org/baserock/baserock/fhs-dirs.git
  ref: master
```

You can write this:

```
- name: fhs-dirs
  repo: baserock:baserock/fhs-dirs.git
  ref: master
```

There are two repo aliases that *must* be defined:

 - `baserock:` (defaulting to git://git.baserock.org/baserock/)
 - `upstream:` (defaulting to git://git.baserock.org/delta/)

Baserock tools should allow changing these values. The main benefit of this compact URI scheme is that definitions are not tied to a specific Git server, or protocol. You can build against a mirror of the original Git server, or change the protocol that is used, just by altering the repo-alias configuration.

Build environment
-----------------

### Sandboxing

Builds should be done an isolated 'staging area', with only the specified dependencies available to the build process. The simplest approach is to install the dependencies in an empty directory, then [chroot](https://en.wikipedia.org/wiki/Chroot) into it. The more sandboxing the build tool can do, the better, because it lowers the chance of unexpected and unreproducible errors in the build process. The [Sandboxlib](https://github.com/CodethinkLabs/sandboxlib) Python library may be useful.

The exception to the above is if the 'build-mode' field for a chunk is set to 'bootstrap'. Chunks in bootstrap mode are treated specially and do have access to tools from the host system. 

FIXME: more detail is needed here!

### Environment variables

The following environment variables can be used in chunk configure/build/install commands, and must be defined by the build tool.

 - `MORPH_ARCH`: the Morph-specific architecture name; see <http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/util.py#n473> for a list of valid architectures
 - `PREFIX`: the value of the 'prefix' field for this chunk (set in the stratum .morph file); default /usr
 - `TARGET`: the [GNU architecture triplet](http://wiki.osdev.org/Target_Triplet) for the target architecture (for example, x86_64-baserock-linux)
 - `TARGET_STAGE1`: the 'bootstrap' variant of the GNU architecture triplet. This must be different from $TARGET -- you can just change the vendor field to achieve that (e.g. x86_64-bootstrap-linux).

FIXME: The `TARGET` and `TARGET_STAGE1` fields are specific to building GNU/Linux based systems, they shouldn't be mandated in the spec.

Deployment extension API
------------------------

**NOTE**: This section is more of an explaination of how the `morph deploy`
command works than a general-purpose spec, in places. We hope to fix that in
future.

As noted in the previous section: within this document, consider "deployment"
to be a process of first post-processing a filesystem tree with one or more
'configure extensions', then performing an operation to convert and/or transfer
the filesystem tree using a 'write extension'.

### Deployment sequence

A Baserock deployment tool should, for each labelled deployment in the
'cluster', follow this sequence.

1. Run the .check extension for that deployment 'type', if one exists. Abort
   now, if it fails.

   The .check extensions exist only to work around the fact that (2), (3) and
   (4) may be slow, because programs that wait several minutes just to raise an
   error that could have been detected right away are very annoying.

2. Create a writeable temporary directory containing the contents of the system
   artifact.

3. For each 'subsystem' specified, recursively run steps 1, 3, 4 and 5. The
   intention is that the result of the subsystem deployments end up *inside*
   the temporary directory created in step 2 for the 'outer' deployment.

4. Run each .configure extension that was specified in the
   `configuration-extensions` field of the system morphology. The order of
   running these is, sadly, unspecified.

5. Run the .write extension for that deployment 'type'.

The deployment of that system and its subsystems is now considered complete,
and temporary data can be removed.

### Locating and running extensions

Deployment extensions are executable files that live in the top level directory
of the definitions.git repo. Configure extensions have the extension
`.configure`, check extensions have the extension `.check`, and write
extensions have the extension `.write`.

[[Morph]] will look for extensions inside the 'morphlib' Python package, in the
`exts/` dir, before looking in the definitions.git repo.

An extension must be executable using the [POSIX `exec()` system call]. We
encourage writing them as Python scripts and using a `#!/usr/bin/env python`
[hashbang], but any executable code is permitted.

The 'execution environment' for an extension is sadly unspecified at the moment.
[[Morph]] runs extensions without any kid of chroot environment, but each one
is run in a separate [mount namespace]. Running extensions chrooted into the
system being deployed does not make sense, as it may not contain the right
dependencies for the extensions to work, and it may expect to run on a
different, incompatible architecture to the system running the deployment in
any case.

A .check extension must be passed one commandline argument: the value of the
`location` or `upgrade-location` field.

A .configure extension must be passed one commandline argument: the path to
the unpacked filesystem tree, which it can modify in some way. It is expected
that a .configure extension will do nothing unless it is enabled using a
configuration variable, but it is up to the code in the .configure extension
to do this, and this behaviour is not currently enforced.

A .write extension must be passed two commandline arguments: the value of the
`location` or `upgrade-location` field, then the path to the unpacked
filesystem tree (which it could modify in some way).

All key/value pairs from the 'cluster' definition for the given labelled
deployment of a system must be passed to all extensions, as environment
variables. The `type`, `location`, `upgrade-type` and `upgrade-location` fields
do not need to be passed in.

Extensions are expected to set exit code 0 on success, and a non-zero value on
failure. If any extension returns non-zero, the deployment tool should abort
the deployment.

Extensions are expected to write status information to 'stdout', and any error
messages to 'stderr'. This is for user feedback ONLY, deployment tools should
not do anything with this data other than log and/or display it.

[[Morph]] sets an extra environment variable `MORPH_LOG_FD` which is a file
descriptor that the extension can write log messages to, in order for them to
appear in morph.log but not on stdout or stderr.

[POSIX `exec()` system call]: http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html
[hashbang]: https://en.wikipedia.org/wiki/Shebang_%28Unix%29
[mount namespace]: https://stackoverflow.com/questions/22889241/linux-understanding-the-mount-namespace-clone-clone-newns-flag#22889401

### Help for extensions

The .configure and .write extensions may provide .help files as documentation.
The .help file should be valid [YAML], containing a dict with the key `help`
and a string value.

For example, the `tar.write` extension could provide `tar.write.help` with the
following content:

    help: |
        Deploy a system as a .tar file.

### Common configuration parameters used by extensions

Fill me in!

[morph.git]: git://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph
[YAML]: http://yaml.org/