summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJürg Billeter <j@bitron.ch>2017-12-12 07:49:53 +0100
committerJürg Billeter <j@bitron.ch>2017-12-20 11:22:30 +0100
commit29cc8d3f0e75f7452446a7effd1456062aade5db (patch)
tree133f04c8e37b4caea43a0a4909e39ec13ac05fa9
parent461ab56a19e86faf482bef2b7cbea1017d16d8d7 (diff)
downloadbuildstream-29cc8d3f0e75f7452446a7effd1456062aade5db.tar.gz
doc: Add cache key documentationjuerg/cache-key-doc
-rw-r--r--doc/source/cachekeys.rst95
-rw-r--r--doc/source/index.rst5
2 files changed, 100 insertions, 0 deletions
diff --git a/doc/source/cachekeys.rst b/doc/source/cachekeys.rst
new file mode 100644
index 000000000..da8d049c1
--- /dev/null
+++ b/doc/source/cachekeys.rst
@@ -0,0 +1,95 @@
+.. _cachekeys:
+
+
+Cache Keys
+==========
+
+Cache keys for artifacts are generated from the inputs of the build process
+for the purpose of reusing artifacts in a well-defined, predictable way.
+
+Structure
+---------
+Cache keys are SHA256 hash values generated from a pickled Python dict that
+includes:
+
+* Environment (e.g., project configuration and variables)
+* Element configuration (details depend on element kind, ``Element.get_unique_key()``)
+* Sources (``Source.get_unique_key()``)
+* Dependencies (depending on cache key type, see below)
+* Public data
+
+Cache Key Types
+---------------
+There are two types of cache keys in BuildStream, ``strong`` and ``weak``.
+
+The purpose of a ``strong`` cache key is to capture the state of as many aspects
+as possible that can have an influence on the build output. The aim is that
+builds will be fully reproducible as long as the cache key doesn't change,
+with suitable module build systems that don't embed timestamps, for example.
+
+A ``strong`` cache key includes the strong cache key of each build dependency
+(and their runtime dependencies) of the element as changes in build dependencies
+(or their runtime dependencies) can result in build differences in reverse
+dependencies. This means that whenever the strong cache key of a dependency
+changes, the strong cache key of its reverse dependencies will change as well.
+
+A ``weak`` cache key has an almost identical structure, however, it includes
+only the names of build dependencies, not their cache keys or their runtime
+dependencies. A weak cache key will thus still change when the element itself
+or the environment changes but it will not change when a dependency is updated.
+
+For elements without build dependencies the ``strong`` cache key is identical
+to the ``weak`` cache key.
+
+Strict Build Plan
+-----------------
+This is the default build plan that exclusively uses ``strong`` cache keys
+for the core functionality. An element's cache key can be calculated when
+the cache keys of the element's build dependencies (and their runtime
+dependencies) have been calculated and either tracking is not enabled or it
+has already completed for this element, i.e., the ``ref`` is available.
+This means that with tracking disabled the cache keys of all elements could be
+calculated right at the start of a build session.
+
+While BuildStream only uses ``strong`` cache keys with the strict build plan
+for the actual staging and build process, it will still calculate ``weak``
+cache keys for each element. This allows BuildStream to store the artifact
+in the cache with both keys, reducing rebuilds when switching between strict
+and non-strict build plans. If the artifact cache already contains an
+artifact with the same ``weak`` cache key, it's replaced. Thus, non-strict
+builds always use the latest artifact available for a given ``weak`` cache key.
+
+Non-strict Build Plan
+---------------------
+The non-strict build plan disables the time-consuming automatic rebuild of
+reverse dependencies at the cost of dropping the reproducibility benefits.
+It uses the ``weak`` cache keys for the core staging and build process.
+I.e., if an artifact is available with the calculated ``weak`` cache key,
+it will be reused for staging instead of being rebuilt. ``weak`` cache keys
+can be calculated early in the build session. After tracking, similar to
+when ``strong`` cache keys can be calculated with a strict build plan.
+
+Similar to how strict build plans also calculate ``weak`` cache keys, non-strict
+build plans also calculate ``strong`` cache keys. However, this is slightly
+more complex. To calculate the ``strong`` cache key of an element, BuildStream
+requires the ``strong`` cache keys of the build dependencies (and their runtime
+dependencies).
+
+The build dependencies of an element may have been updated since the artifact
+was built. With the non-strict build plan the artifact will still be reused.
+However, this means that we cannot use a ``strong`` cache key calculated purely
+based on the element definitions. We need a cache key that matches the
+environment at the time the artifact was built, not the current definitions.
+
+The only way to get the correct ``strong`` cache key is by retrieving it from
+the metadata stored in the artifact. As artifacts may need to be pulled from a
+remote artifact cache, the ``strong`` cache key is not readily available early
+in the build session. However, it can always be retrieved when an element is
+about to be built, as the dependencies are guaranteed to be in the local
+artifact cache at that point.
+
+``Element._get_cache_key_from_artifact()`` extracts the ``strong`` cache key
+from an artifact in the local cache. ``Element._get_cache_key_for_build()``
+calculates the ``strong`` cache key that is used for a particular build job.
+This is used for the embedded metadata and also as key to store the artifact in
+the cache.
diff --git a/doc/source/index.rst b/doc/source/index.rst
index e62e23492..9e79d5aba 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -76,6 +76,11 @@ Creating Plugins
* :ref:`core_framework`
+Internals
+---------
+* :ref:`cachekeys`
+
+
Indices and tables
------------------
* :ref:`modindex`