summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRaoul Hidalgo Charman <raoul.hidalgocharman@codethink.co.uk>2019-05-28 18:22:56 +0100
committerRaoul Hidalgo Charman <raoul.hidalgocharman@codethink.co.uk>2019-05-31 10:08:54 +0100
commitfe721b65df09497c0f07f5fe414ad5cb2db49095 (patch)
tree2e73585d995358e6f826d2aa06caa8fbcd205990
parentde51a5a587bced0b2092573a092503352a789931 (diff)
downloadbuildstream-fe721b65df09497c0f07f5fe414ad5cb2db49095.tar.gz
doc: Add architecture section on caches
Part of #1024
-rw-r--r--doc/source/arch_caches.rst68
-rw-r--r--doc/source/main_architecture.rst1
-rw-r--r--doc/source/using_configuring_artifact_server.rst2
3 files changed, 71 insertions, 0 deletions
diff --git a/doc/source/arch_caches.rst b/doc/source/arch_caches.rst
new file mode 100644
index 000000000..c415cfc47
--- /dev/null
+++ b/doc/source/arch_caches.rst
@@ -0,0 +1,68 @@
+
+.. _caches:
+
+
+Caches
+======
+
+BuildStream uses local caches to avoid repeating work, and can have remote
+caches configured to allow the results of work to be shared between multiple
+users. There are caches for both elements and sources that map keys to relevant
+metadata and point to data in CAS.
+
+Content Addressable Storage (CAS)
+---------------------------------
+
+The majority of data is stored in Content Addressable Storage or CAS, which
+indexes stored files by the SHA256 hash of their contents. This allows for a
+flat file structure as well as any repeated data to be shared across a CAS. In
+order to store directory structures BuildStream's CAS uses `protocol buffers`_
+for storing directory and file information as defined in Googles `REAPI`_.
+
+:ref:`bst-artifact-server <artifact_command_reference>` runs a `grpc`_ CAS
+service (also defined in REAPI) that both artifact and source cache use,
+allowing them to download and upload files to a remote service.
+
+Artifact caches
+---------------
+
+Artifacts store build results of an element which is then referred to by its
+cache key (described in :ref:`cachekeys`). The artifacts information is then
+stored in a protocol buffer, defined in ``artifact.proto``, which includes
+metadata such as the digest of the files root; strong and weak keys; and log
+files digests. The digests point to locations in the CAS of relavant files and
+directories, allowing BuildStream to query remote CAS servers for this
+information.
+
+:ref:`bst-artifact-server <artifact_command_reference>` uses grpc to implement a
+remote API for an artifact service, that BuildStream then uses to query,
+retrieve and update artifact files, before using this information to download
+the files and other data from the remote CAS.
+
+Source caches
+-------------
+
+Sources are cached by running the :mod:`Source.stage
+<buildstream.source.Source.stage>` method and capturing the directory output of
+this into the CAS, which then use the sources key to refer to this. The source
+key will be calculated with the plugins defined :mod:`Plugin.get_unique_key
+<buildstream.plugin.Plugin.get_unique_key>` and, depending on whether the source
+requires previous sources to be staged (e.g. the patch plugin), the unique key
+of all sources listed before it in an element. Source caches are simpler than
+artifacts, as they just need to map a source key to a directory digest, with no
+additional metadata.
+
+Similar to artifacts, :ref:`bst-artifact-server <artifact_command_reference>`
+uses grpc to implements a 'reference service' API that allows BuildStream to
+query for these source digests, which can then be used to retrieve sources from
+a CAS.
+
+.. note::
+
+ Not all plugins use the same result as the staged output for workspaces. As a
+ result when initialising a workspace, BuildStream may require fetching the
+ original source if it only has the source in the source cache.
+
+.. _protocol buffers: https://developers.google.com/protocol-buffers/docs/overview
+.. _grpc: https://grpc.io
+.. _REAPI: https://github.com/bazelbuild/remote-apis
diff --git a/doc/source/main_architecture.rst b/doc/source/main_architecture.rst
index b0a117ed9..d9b9f3e50 100644
--- a/doc/source/main_architecture.rst
+++ b/doc/source/main_architecture.rst
@@ -14,6 +14,7 @@ This section provides details on the overall BuildStream architecture.
arch_dependency_model
arch_scheduler
arch_cachekeys
+ arch_caches
arch_sandboxing
arch_remote_execution
diff --git a/doc/source/using_configuring_artifact_server.rst b/doc/source/using_configuring_artifact_server.rst
index da61f0f80..6eb64113c 100644
--- a/doc/source/using_configuring_artifact_server.rst
+++ b/doc/source/using_configuring_artifact_server.rst
@@ -91,6 +91,8 @@ requiring BuildStream's more exigent dependencies by setting the
BST_ARTIFACTS_ONLY=1 pip3 install .
+.. _artifact_command_reference:
+
Command reference
~~~~~~~~~~~~~~~~~