Schemas for the Baserock definitions format =========================================== The starting point for learning about the Baserock definitions format is the wiki page at . The schemas/ directory in the Baserock reference definitions.git repository is the canonical home for some schemas which describe the format in a machine-readable way. There are two parts to 'Baserock definitions'. The 'Baserock data model' is an abstract vocabulary for describing how to build, integrate and deploy software components. The 'Baserock definitions YAML representation format' is a serialisation format for the data model, which lets you write YAML files describing how to build, integrate and deploy software components. If you want to make the YAML files easier to deal with, you only need to care about the JSON-Schema schemas and anything that parses the YAML files. If you want to write a new tool to build, visualise, analyse or otherwise process Baserock definitions in some way, you can ignore the syntax altogether, use a pre-existing parser, and just think in terms of the data model. If you want to change the data model, you still have quite a difficult job, but at least it should be simple to write a translation layer on top of an existing parser so that you can interpret all the existing Baserock reference system definitions in terms of your new data model. The Baserock definitions YAML representation format --------------------------------------------------- YAML itself is a syntax for representating data as text. The YAML specification is at . The data needs to be structured in a certain way for it to make sense as Baserock build/integration/deployment instructions. We have used JSON-Schema to describe the required layout of the data. The JSON-Schema standard is described at . The JSON-Schema language was designed for use with JSON, which is another syntax for representing data as text, which happens to be a subset of YAML. We have found so far that JSON-Schema works well with YAML, at least when using the Python 'jsonschema' module. Definitions are represented by files with a '.morph' extension. There are four different kinds: 'chunk', 'stratum', 'system', and 'cluster'. Each of these is described with a different .json-schema file. It is possible to merge all these into one file, and use the 'oneOf' field to say that any .morph file should match exactly one of the layouts. The only issue with this approach is that the Python 'jsonschema' model will give you totally useless errors if anything is invalid (along the lines of " is not valid under any of the given schemas"). So for now they are separate. Tools for working with the Baserock YAML schemas ------------------------------------------------ You can use `scripts/yaml-jsonschema` to validate .morph files against the schemas. For example: scripts/yaml-jsonschema schemas/cluster.json-schema clusters/*.morph The Baserock data model ----------------------- The best way to represent information on disk may be a pretty inefficient way to represent that data in a computer's memory. Likewise, the way a program stores data internally may be totally impractical for people to edit directly. The file `baserock.owl` is an initial effort to describe the Baserock data model independently of any syntax or representation. We use the W3C standard Web Ontology Language (OWL), combined with the much simpler RDF Schema language. Together, this allows defining the vocabulary we can use to define build, integration and deployment instructions. There are various ways to represent OWL 'ontologies'; `baserock.owl` uses a representation format named Turtle, which is designed to be convenient for hand-editing. The current data model is very closely tied to the current syntax, but we are looking to change this and make it much more generic. This will involve removing the current 'Chunk', 'Stratum', 'System' and 'Cluster' classes, and adding something like 'thing with build instructions' and 'thing that contains other things' instead. Name suggestions are welcome :-) It's useful to consider existing OWL and RDF Schema vocabularies that are related to the Baserock data model. In future we can link the Baserock reference system definitions with related data published elsewhere on the Web. Here is an incomplete list: - Description of a Project (DOAP): https://github.com/edumbill/doap - Software Ontology: https://robertdavidstevens.wordpress.com/2014/06/19/the-software-ontology-swo/ - Software Packet Data Exchange (SPDX): https://spdx.org/about-spdx/what-is-spdx Tools for working with the Baserock data model schema ----------------------------------------------------- It's difficult to find to a short, relevant 'getting started' guide. The website http://www.linkeddata.org/ has a lot of background that should be useful. The `rapper` commandline tool, which comes as part of the 'raptor2' C library, is helpful for converting from one syntax to another, and checking if `baserock.owl` is valid Turtle syntax. The 'raptor2' homepage is . To check the syntax of `baserock.owl` using `rapper`: rapper -i turtle schemas/baserock.owl Omissions / TODO items ---------------------- - Device nodes: chunk .morph files can list a set of device nodes. In `chunk.json-schema` this is recognised, but in `baserock.owl` it is missing. - 'Lorry' mirroring instructions. These contain information on where 'upstream' source code is kept, which should be considered part of the data model. A JSON schema may be better off in lorry.git or baserock/local-config/lorries.git. - Metadata in built systems. This is currently not standardised at all. Comments -------- As far as I know, Baserock is the first project to treat build, integration and deployment instructions as data rather than code. If you have questions about the schemas, the definitions format, or the overall approach, and they aren't answered here or in , then please ask on the baserock-dev@baserock.org mailing list.