Schemas for the Baserock definitions format
===========================================

The starting point for learning about the Baserock definitions format is the
wiki page at <http://wiki.baserock.org/definitions/>.

The schemas/ directory in the Baserock reference definitions.git repository is
the canonical home for some schemas which describe the format in a
machine-readable way.

There are two parts to 'Baserock definitions'. The 'Baserock data model' is an
abstract vocabulary for describing how to build, integrate and deploy software
components. The 'Baserock definitions YAML representation format' is a
serialisation format for the data model, which lets you write YAML files
describing how to build, integrate and deploy software components.

If you want to make the YAML files easier to deal with, you only need
to care about the JSON-Schema schemas and anything that parses the YAML files.

If you want to write a new tool to build, visualise, analyse or otherwise
process Baserock definitions in some way, you can ignore the syntax altogether,
use a pre-existing parser, and just think in terms of the data
model.

If you want to change the data model, you still have quite a difficult job,
but at least it should be simple to write a translation layer on top of an
existing parser so that you can interpret all the existing Baserock reference
system definitions in terms of your new data model.


The Baserock definitions YAML representation format
---------------------------------------------------

YAML itself is a syntax for representating data as text. The YAML specification
is at <http://www.yaml.org/>.

The data needs to be structured in a certain way for it to make sense as
Baserock build/integration/deployment instructions. We have used JSON-Schema
to describe the required layout of the data.

The JSON-Schema standard is described at <http://json-schema.org/>. The
JSON-Schema language was designed for use with JSON, which is another syntax
for representing data as text, which happens to be a subset of YAML. We have
found so far that JSON-Schema works well with YAML, at least when using the
Python 'jsonschema' module.

Definitions are represented by files with a '.morph' extension. There are four
different kinds: 'chunk', 'stratum', 'system', and 'cluster'. Each of these is
described with a different .json-schema file. It is possible to merge all these
into one file, and use the 'oneOf' field to say that any .morph file should
match exactly one of the layouts. The only issue with this approach is that
the Python 'jsonschema' model will give you totally useless errors if anything
is invalid (along the lines of "<dump of entire file> is not valid under any of
the given schemas"). So for now they are separate.


Tools for working with the Baserock YAML schemas
------------------------------------------------

You can use `scripts/yaml-jsonschema` to validate .morph files against the
schemas. For example:

    scripts/yaml-jsonschema schemas/cluster.json-schema clusters/*.morph


The Baserock data model
-----------------------

The best way to represent information on disk may be a pretty inefficient way
to represent that data in a computer's memory. Likewise, the way a program
stores data internally may be totally impractical for people to edit directly.

The file `baserock.owl` is an initial effort to describe the Baserock data
model independently of any syntax or representation.

We use the W3C standard Web Ontology Language (OWL), combined with the much
simpler RDF Schema language. Together, this allows defining the vocabulary we
can use to define build, integration and deployment instructions. There are
various ways to represent OWL 'ontologies'; `baserock.owl` uses a
representation format named Turtle, which is designed to be convenient for
hand-editing.

The current data model is very closely tied to the current syntax, but we are
looking to change this and make it much more generic. This will involve
removing the current 'Chunk', 'Stratum', 'System' and 'Cluster' classes, and
adding something like 'thing with build instructions' and 'thing that contains
other things' instead. Name suggestions are welcome :-)

It's useful to consider existing OWL and RDF Schema vocabularies that are
related to the Baserock data model. In future we can link the Baserock
reference system definitions with related data published elsewhere on the Web.
Here is an incomplete list:

  - Description of a Project (DOAP): https://github.com/edumbill/doap
  - Software Ontology: https://robertdavidstevens.wordpress.com/2014/06/19/the-software-ontology-swo/
  - Software Packet Data Exchange (SPDX): https://spdx.org/about-spdx/what-is-spdx


Tools for working with the Baserock data model schema
-----------------------------------------------------

It's difficult to find to a short, relevant 'getting started' guide. The
website http://www.linkeddata.org/ has a lot of background that should be
useful.

The `rapper` commandline tool, which comes as part of the 'raptor2' C library,
is helpful for converting from one syntax to another, and checking if
`baserock.owl` is valid Turtle syntax. The 'raptor2' homepage is
<http://www.librdf.org/>.

To check the syntax of `baserock.owl` using `rapper`:

    rapper -i turtle schemas/baserock.owl


Omissions / TODO items
----------------------

- Device nodes: chunk .morph files can list a set of device nodes. In
  `chunk.json-schema` this is recognised, but in `baserock.owl` it is missing.

- 'Lorry' mirroring instructions. These contain information on where 'upstream'
  source code is kept, which should be considered part of the data model. A
  JSON schema may be better off in lorry.git or
  baserock/local-config/lorries.git.

- Metadata in built systems. This is currently not standardised at all.


Comments
--------

As far as I know, Baserock is the first project to treat build, integration and
deployment instructions as data rather than code. If you have questions about
the schemas, the definitions format, or the overall approach, and they aren't
answered here or in <http://wiki.baserock.org/definitions/>, then please ask on
the baserock-dev@baserock.org mailing list.