summaryrefslogtreecommitdiff
path: root/spec.mdwn
blob: 34431f9eeec2a1d52363fbc1969f579423760a3c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
Title: Baserock definitions format

The Baserock definitions format
===============================

This page describes the Baserock definitions format (morph files). It is intended to be useful as
an *informal* specification. It is not guaranteed to be accurate or exhaustive.

If you are just getting started with Baserock, the wiki pages [quick-start](http://wiki.baserock.org/quick-start), [devel-with](http://wiki.baserock.org/devel-with) and [guides](http://wiki.baserock.org/guides) pages provide a more practical introduction.

The allowed YAML constructs are described in json-schema format here: <http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas>.

The data model is described using OWL here: <http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas/baserock.owl>.

The source code of [Morph] and [YBD] might be more useful if you need a completely accurate description of how the current Baserock definition format is used in practice.

Versioning
----------

The current version of the definitions format is version 8.

Definitions repository
----------------------

The design of Baserock aims to encourage users to keep *all* the information
needed to build and deploy their software in one Git repository. This repo is
often referred to as the 'definitions.git' repo, although nothing forces you
to call it that.

Some of this information will be Baserock 'definitions files', which describe
how to build or deploy some software. Baserock tooling should expect that all
the definitions it needs to process live in one Git repo. The definitions.git
repo can and should contain any other files needed for build and deployment as
well, such as configuration data and documentation.

The Baserock Project maintains a set of 'reference system definitions' at
[git://git.baserock.org/baserock/baserock/definitions] (which can also be
referred to as [baserock:baserock/definitions], when using the repo aliasing
feature of [Morph]). That repo contains systems that can be built and
deployed as-is, but it is important that users can fork this repo as well,
and work on systems in their version using `git merge` or `git rebase` to keep
up to date with changes from upstream.

Baserock tooling should not mandate anything about the definitions repo that
the user wants to process, other than the rules defined below.


[git://git.baserock.org/baserock/baserock/definitions]: http://git.baserock.org/cgit/baserock/baserock/definitions.git
[baserock:baserock/definitions]: http://git.baserock.org/cgit/baserock/baserock/definitions.git

### Structure

Tooling can enforce that the definitions.git repo is actually a Git repo, but
it can equally just treat it as a tree of files and directories.

The top directory of the repo must contain a file named `VERSION`, that is
valid [YAML] and contains a dict with key "version" and a value that is an
integer.

The integer specifies the version of the definitions format that this repo
uses. A tool should refuse to process a version that it doesn't support, to
avoid unpredictable errors. See also the [Versioning](#versioning) section.

The top directory of the repo can also contain a file named `DEFAULTS`. This
holds repo-wide 'build-system' and 'split-rules' information. See the
[defaults](#defaults) section below.

To find all the Baserock definition files in the repo, tooling can recursively
scan the contents of the repo for files matching the glob pattern "\*.morph".

Definitions file syntax
-----------------------

[YAML] is used for all Baserock definitions files.

The toplevel entity in a definition is a dict, in all cases. Any syntax errors
or type errors (such the toplevel entity being a number, or something) should
be reported to the user.

The [Morph] tool raises an error if any unknown dictionary keys are found in
the definition, mainly so that it reports any spelling errors in key names.

### Common fields

For all definitions, use the following fields:

* `name`: the name of the definition; it must currently match the filename
  (without the `.morph` suffix); **required**
* `kind`: the kind of thing being built; **required**
* `description`: a comment to describe what the definition is for; optional

Build definitions: Chunks, Systems and Strata
---------------------------------------------

Within this document, consider 'building' to be the act of running a series of
commands in a given 'environment', where the commands and how to build the
environment are completely specified by the definitions and the build tool.

### Chunks

A 'chunk' definition describes an individual component, which can be built from
a Git repository by executing the given sequence of commands.

The structure of a 'chunk' definition is described using [JSON-Schema] in the
[spec.git] repo:
<http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas/chunk.json-schema>

The build sequence consists of four phases:

1. configure
2. build
3. install
4. strip

You can define one or more commands for each phase, or none. Here is an example
chunk:

    name: glibc
    kind: chunk
    description: GNU C library (example)

    configure-commands:
    - mkdir o
    - cd o && ../libc/configure --prefix=/usr

    build-commands:
    - cd o && make

    install-commands:
    - cd o && make install_root="$DESTDIR" install

A chunk can optionally make use of a repo-wide build system defined in
['DEFAULTS'](#defaults), by using the `build-system` field.  In this case, the
command sequences defined in DEFAULTS are used.

Any predefined command sequence can be overridden by specifying a new value.
You can also extend a predefined build system with the `pre-` or `post-`
fields. For example, `pre-configure-commands` are run directly before
`configure-commands`, and all `post-configure-commands` are run directly after.

For example:

    name: git
    kind: chunk
    description: Git version control tool (example)

    build-system: autotools

    # This command will run before the normal 'configure' command sequence.
    pre-configure-commands:
    - make configure

    # This command overrides the normal 'build' command sequence.
    build-commands:
    - make all

If a chunk doesn't need to override anything from ['DEFAULTS'](#defaults), you can avoid
having a chunk .morph file altogether, and just set 'build-system' when
referring to the chunk from the stratum.

The `max-jobs` field can be used to pass a custom value for --jobs to Make,
via the `MAKEFLAGS` environment variable. This is useful for Makefiles which
don't work in parallel: you can set `max-jobs: 1` to work around the problem.
Parallel jobs are only used during the `build-commands` phase, since the other
phases are often not safe when run in parallel; `morph` picks a default value
based on the number of CPUs on the host system.

Built chunks are split up into multiple artifacts. The default splitting rules
for a chunk are defined in ['DEFAULTS'](#defaults). You can use the `chunks` field of a
chunk to override these. The `chunks` field is a key/value map of lists of
regular expressions; the key is the name of a binary chunk, the regexps match
the pathnames that will be included in that chunk; the patterns match the
pathnames that get installed by `install-commands` (the whole path below
`DESTDIR`); every file must be matched by at least one pattern; by default, a
single chunk gets created, named according to the definition, and containing
all files.

### Strata

A 'stratum' is a group of related chunks. A stratum can contain only chunks.
Certain information about how to build a chunk is defined in the containing
stratum, rather than in the chunk definition.

The structure of a 'stratum' definition is described using [JSON-Schema] in the
[spec.git] repo:
<http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas/stratum.json-schema>

The fields mean the following:

* `build-depends`: a list of strings, each of which refers to another
  stratum that the current stratum depends on. This list may be omitted
  or empty if the stratum does not depend on anything else.
* `chunks`: a list of key/value mappings, where each mapping corresponds
  to a chunk to be included in the stratum; the mappings may use the
  following keys:
    - `name` is the chunk's name (may be different from the
      morphology name),
    - `repo` is the repository in which to find (defaults to chunk name),
    - `ref` identifies the commit to use (typically a branch name, but
       any tree-ish git accepts is ok)
    - `morph` is a path, relative to the top of the definitions repo,
      to a chunk .morph file.
    - `build-system` specifies one of the predefined build systems. You
      must specify ONE of `morph` or `build-system` for each chunk.
    - `submodules` is a list of key/value mappings that specifies url
       overrides for .gitmodules. The key should be the name of the submodule
       as listed in .gitmodules, not the path. The value is a dictionary
       containing one key/value pair:
        * `url`: The url override for the submodule, this can include aliasing.
  In addition to these keys, each of the sources can specify a list of
  build dependencies using the `build-depends` field. To specify one or
  more chunk dependencies, `build-depends` needs to be set to a list
  that contains the names of chunks that the source depends on in the
  same stratum. These names correspond to the values of the `name`
  fields of the other chunks.

  At the moment, the ordering is significant in chunk build-depends. This
  is used during bootstrapping, when you want to override the first build of
  a component with its second version in a staging area. This feature is kind
  of a workaround for the lack of distinction between build and runtime
  dependencies.

### Systems

In the Baserock model, a 'system' is the top level entity that you actually
build and execute. Systems contain one or more strata.

The structure of a 'system' definition is described using [JSON-Schema] in the
[spec.git] repo:
<http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas/system.json-schema>

The fields mean the following:

* `strata`: a list of key/value mappings, similar to the 'chunks' field of a
  stratum. Two fields are allowed (are both required?):
    - `name`: name of the artifact when the stratum is build
    - `morph`: path to a stratum .morph file relative to the top of the containing repo

### Example stratum:

    name: foundation
    kind: stratum
    chunks:
    - name: fhs-dirs
      repo: upstream:fhs-dirs
      ref: baserock/bootstrap
      build-depends: []
    - name: linux-api-headers
      repo: upstream:linux
      ref: baserock/morph
      build-depends:
      - fhs-dirs
    - name: eglibc
      repo: upstream:eglibc
      ref: baserock/bootstrap
      build-depends:
      - linux-api-headers
    - name: busybox
      repo: upstream:busybox
      ref: baserock/bootstrap
      build-depends:
      - fhs-dirs
      - linux-api-headers
    - name: ansible
      repo: upstream:ansible
      ref: v2.0
      submodules:
        lib/ansible/modules/core:
          url: upstream:ansible-modules-core
        lib/ansible/modules/extras:
          url: upstream:ansible-modules-extras

### Example system:

    name: base
    kind: system
    strata:
    - morph: foundation
    - morph: linux-stratum

Deployment definitions: Clusters
--------------------------------

For 'deployment', Baserock defines an API for running 'extensions'. The
'cluster' and 'system' definitions together describe what extensions should be
run, and what should be set in their environment, in order to deploy the
system. See the [Deployment](deployment) section for how to find and execute
the extensions.

Within this document, consider "deployment" to be a process of first
post-processing a filesystem tree with one or more 'configure extensions', then 
performing an operation to convert and/or transfer the filesystem tree
using a 'write extension'.

The structure of the 'cluster' definitions is described using [JSON-Schema] in
the [spec.git] repo:
<http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas/cluster.json-schema>

A cluster morphology defines a list of systems to deploy, and for each system a
list of ways to deploy them. The fields are used as follows:

* **systems**: a list of systems to deploy;
    the value is a list of mappings, where each mapping has the
    following keys:

    * **morph**: the system morphology to use in the specified
        commit.

    * **deploy**: a mapping where each key identifies a
        system and each system has at least the following keys:

        * **type**: identifies the relative path, without extension, to the
            '.write' program that should be used for this system.
        * **location**: where the deployed system should end up
            at. The syntax depends on the '.write' extension chosen in the
            'type' field.

        Optionally, it can specify **upgrade-type** and
        **upgrade-location** as well, which should be interpreted in the same
        way.

        The system dictionary can have any number of other entries. These
        should be collected up and are passed to each '.configure' extension
        and to the '.write' extension, through the environment. The extensions
        can interpret any of them in any manner.

    * **deploy-defaults**: allows multiple deployments of the same
        system to share some settings, when they can. Default settings
        will be overridden by those defined inside the deploy mapping.

    * **subsystems**: structured in the same way as the 'systems' entry, this
        allows deploying something *within* a system. The Baserock reference
        definitions use this to provide an initramfs inside some of the
        reference systems.

Example:

    name: cluster-foo
    kind: cluster
    systems:
        - morph: devel-system-x86_64-generic.morph
            deploy:
                cluster-foo-x86_64-1:
                    type: extensions/kvm
                    location: kvm+ssh://user@host/x86_64-1/x86_64-1.img
                    upgrade-type: extensions/ssh-rsync
                    upgrade-location: root@localhost
                    HOSTNAME: cluster-foo-x86_64-1
                    DISK_SIZE: 4G
                    RAM_SIZE: 4G
                    VCPUS: 2
        - morph: devel-system-armv7-highbank
            deploy-defaults:
                type: extensions/pxeboot
                location: cluster-foo-pxeboot-server
            deploy:
                cluster-foo-armv7-1:
                    HOSTNAME: cluster-foo-armv7-1
                cluster-foo-armv7-2:
                    HOSTNAME: cluster-foo-armv7-2

Repo URLs
---------

Git repository locations can (and should) be abbreviated using the 'repo-alias' feature of Baserock definitions. This is a kind of [Compact URI](http://www.w3.org/TR/2009/CR-curie-20090116/). It currently only affects the 'repo' fields in a stratum .morph file.

For example, instead of writing this:

```
- name: fhs-dirs
  repo: git://git.baserock.org/baserock/baserock/fhs-dirs.git
  ref: master
```

You can write this:

```
- name: fhs-dirs
  repo: baserock:baserock/fhs-dirs.git
  ref: master
```

There are two repo aliases that *must* be defined:

 - `baserock:` (defaulting to git://git.baserock.org/baserock/)
 - `upstream:` (defaulting to git://git.baserock.org/delta/)

Baserock tools should allow changing these values. The main benefit of this compact URI scheme is that definitions are not tied to a specific Git server, or protocol. You can build against a mirror of the original Git server, or change the protocol that is used, just by altering the repo-alias configuration.

Defaults
--------

A definitions repository can contain a file named DEFAULTS. This file sets up
repo-wide configuration.

The structure of the DEFAULTS file is described using [JSON-Schema] in
the [spec.git] repo:
<http://git.baserock.org/cgit/baserock/baserock/spec.git/tree/schemas/defaults.json-schema>

### Build systems

You can define common command sequences using the 'build-systems' key. These
can then be referenced in chunk (and stratum) definitions, to avoid repeating
common patterns.

Here is an example DEFAULTS file that defines the python-distutils build
system:

    build-systems:
      python-distutils:
        configure-commands: []
        build-commands:
        - python setup.py build
        install-commands:
        - python setup.py install --prefix "$PREFIX" --root "$DESTDIR"

### Splitting rules

One 'source' in Baserock can produce multiple binary 'artifacts'. This allows
you to separate programs, libraries, documentation, debugging information, and
whatever else into different artifacts. You can then choose to exclude some of
these artifacts from your final system.

Split rules are defined separately for chunks, and for strata.

Chunks define a list of artifacts, and a series of regular expression patterns
that are matched against filenames to define what to include. The list of
artifacts is evaluated *in order*, so if two artifacts match the same pattern,
those files will go into whichever artifact is defined first.

Stratum matches work similarly, except the stratum artifacts are a series of
patterns that match the *names of chunk artifacts*.

Here is an example DEFAULTS file that sets up some simple splitting rules.

    split-rules:
      chunk:
      - artifact: -devel
        include:
        - (usr/)?include/.*
        - (usr/)?share/man/.*
      - artifact: -runtime
        include:
        - .*

      stratum:
      - artifact: -devel
        include:
        - .*-devel
      - artifact: -runtime
        include:
        - .*-runtime

You could use these rules to keep header files (/usr/include/\*) and manual
pages (/usr/share/man/\*) out of your systems, by only including the -runtime
stratum artifacts.

Build environment
-----------------

### Sandboxing

Builds should be done an isolated 'staging area', with only the specified dependencies available to the build process. The simplest approach is to install the dependencies in an empty directory, then [chroot](https://en.wikipedia.org/wiki/Chroot) into it. The more sandboxing the build tool can do, the better, because it lowers the chance of unexpected and unreproducible errors in the build process. The [Sandboxlib](https://github.com/CodethinkLabs/sandboxlib) Python library may be useful.

The exception to the above is if the 'build-mode' field for a chunk is set to 'bootstrap'. Chunks in bootstrap mode are treated specially and do have access to tools from the host system. 

FIXME: more detail is needed here!

### Environment variables

The following environment variables can be used in chunk configure/build/install commands, and must be defined by the build tool.

 - `MORPH_ARCH`: the Morph-specific architecture name; see <http://git.baserock.org/cgi-bin/cgit.cgi/baserock/baserock/morph.git/tree/morphlib/util.py#n473> for a list of valid architectures
 - `PREFIX`: the value of the 'prefix' field for this chunk (set in the stratum .morph file); default /usr
 - `TARGET`: the [GNU architecture triplet](http://wiki.osdev.org/Target_Triplet) for the target architecture (for example, x86_64-baserock-linux)
 - `TARGET_STAGE1`: the 'bootstrap' variant of the GNU architecture triplet. This must be different from $TARGET -- you can just change the vendor field to achieve that (e.g. x86_64-bootstrap-linux).

FIXME: The `TARGET` and `TARGET_STAGE1` fields are specific to building GNU/Linux based systems, they shouldn't be mandated in the spec.

[JSON-Schema]: https://www.json-schema.org/
[Morph]: http://wiki.baserock.org/Morph/
[YBD]: http://wiki.baserock.org/ybd/
[morph.git]: git://git.baserock.org/cgit/baserock/baserock/morph.git/
[spec.git]: git://git.baserock.org/cgit/baserock/baserock/spec.git/
[YAML]: http://yaml.org/