path: root/schemas/README.schemas
diff options
Diffstat (limited to 'schemas/README.schemas')
1 files changed, 137 insertions, 0 deletions
diff --git a/schemas/README.schemas b/schemas/README.schemas
new file mode 100644
index 00000000..a7789187
--- /dev/null
+++ b/schemas/README.schemas
@@ -0,0 +1,137 @@
+Schemas for the Baserock definitions format
+The starting point for learning about the Baserock definitions format is the
+wiki page at <>.
+The schemas/ directory in the Baserock reference definitions.git repository is
+the canonical home for some schemas which describe the format in a
+machine-readable way.
+There are two parts to 'Baserock definitions'. The 'Baserock data model' is an
+abstract vocabulary for describing how to build, integrate and deploy software
+components. The 'Baserock definitions YAML representation format' is a
+serialisation format for the data model, which lets you write YAML files
+describing how to build, integrate and deploy software components.
+If you want to make the YAML files easier to deal with, you only need
+to care about the JSON-Schema schemas and anything that parses the YAML files.
+If you want to write a new tool to build, visualise, analyse or otherwise
+process Baserock definitions in some way, you can ignore the syntax altogether,
+use a pre-existing parser, and just think in terms of the data
+If you want to change the data model, you still have quite a difficult job,
+but at least it should be simple to write a translation layer on top of an
+existing parser so that you can interpret all the existing Baserock reference
+system definitions in terms of your new data model.
+The Baserock definitions YAML representation format
+YAML itself is a syntax for representating data as text. The YAML specification
+is at <>.
+The data needs to be structured in a certain way for it to make sense as
+Baserock build/integration/deployment instructions. We have used JSON-Schema
+to describe the required layout of the data.
+The JSON-Schema standard is described at <>. The
+JSON-Schema language was designed for use with JSON, which is another syntax
+for representing data as text, which happens to be a subset of YAML. We have
+found so far that JSON-Schema works well with YAML, at least when using the
+Python 'jsonschema' module.
+Definitions are represented by files with a '.morph' extension. There are four
+different kinds: 'chunk', 'stratum', 'system', and 'cluster'. Each of these is
+described with a different .json-schema file. It is possible to merge all these
+into one file, and use the 'oneOf' field to say that any .morph file should
+match exactly one of the layouts. The only issue with this approach is that
+the Python 'jsonschema' model will give you totally useless errors if anything
+is invalid (along the lines of "<dump of entire file> is not valid under any of
+the given schemas"). So for now they are separate.
+Tools for working with the Baserock YAML schemas
+You can use `scripts/yaml-jsonschema` to validate .morph files against the
+schemas. For example:
+ scripts/yaml-jsonschema schemas/cluster.json-schema clusters/*.morph
+The Baserock data model
+The best way to represent information on disk may be a pretty inefficient way
+to represent that data in a computer's memory. Likewise, the way a program
+stores data internally may be totally impractical for people to edit directly.
+The file `baserock.owl` is an initial effort to describe the Baserock data
+model independently of any syntax or representation.
+We use the W3C standard Web Ontology Language (OWL), combined with the much
+simpler RDF Schema language. Together, this allows defining the vocabulary we
+can use to define build, integration and deployment instructions. There are
+various ways to represent OWL 'ontologies'; `baserock.owl` uses a
+representation format named Turtle, which is designed to be convenient for
+The current data model is very closely tied to the current syntax, but we are
+looking to change this and make it much more generic. This will involve
+removing the current 'Chunk', 'Stratum', 'System' and 'Cluster' classes, and
+adding something like 'thing with build instructions' and 'thing that contains
+other things' instead. Name suggestions are welcome :-)
+It's useful to consider existing OWL and RDF Schema vocabularies that are
+related to the Baserock data model. In future we can link the Baserock
+reference system definitions with related data published elsewhere on the Web.
+Here is an incomplete list:
+ - Description of a Project (DOAP):
+ - Software Ontology:
+ - Software Packet Data Exchange (SPDX):
+Tools for working with the Baserock data model schema
+It's difficult to find to a short, relevant 'getting started' guide. The
+website has a lot of background that should be
+The `rapper` commandline tool, which comes as part of the 'raptor2' C library,
+is helpful for converting from one syntax to another, and checking if
+`baserock.owl` is valid Turtle syntax. The 'raptor2' homepage is
+To check the syntax of `baserock.owl` using `rapper`:
+ rapper -i turtle schemas/baserock.owl
+Omissions / TODO items
+- Device nodes: chunk .morph files can list a set of device nodes. In
+ `chunk.json-schema` this is recognised, but in `baserock.owl` it is missing.
+- 'Lorry' mirroring instructions. These contain information on where 'upstream'
+ source code is kept, which should be considered part of the data model. A
+ JSON schema may be better off in lorry.git or
+ baserock/local-config/lorries.git.
+- Metadata in built systems. This is currently not standardised at all.
+As far as I know, Baserock is the first project to treat build, integration and
+deployment instructions as data rather than code. If you have questions about
+the schemas, the definitions format, or the overall approach, and they aren't
+answered here or in <>, then please ask on
+the mailing list.