diff options
Diffstat (limited to 'doc/source/additional_cachekeys.rst')
-rw-r--r-- | doc/source/additional_cachekeys.rst | 96 |
1 files changed, 96 insertions, 0 deletions
diff --git a/doc/source/additional_cachekeys.rst b/doc/source/additional_cachekeys.rst new file mode 100644 index 000000000..f0df796c5 --- /dev/null +++ b/doc/source/additional_cachekeys.rst @@ -0,0 +1,96 @@ + +.. _cachekeys: + + +Cache keys +========== + +Cache keys for artifacts are generated from the inputs of the build process +for the purpose of reusing artifacts in a well-defined, predictable way. + +Structure +--------- +Cache keys are SHA256 hash values generated from a pickled Python dict that +includes: + +* Environment (e.g., project configuration and variables) +* Element configuration (details depend on element kind, ``Element.get_unique_key()``) +* Sources (``Source.get_unique_key()``) +* Dependencies (depending on cache key type, see below) +* Public data + +Cache key types +--------------- +There are two types of cache keys in BuildStream, ``strong`` and ``weak``. + +The purpose of a ``strong`` cache key is to capture the state of as many aspects +as possible that can have an influence on the build output. The aim is that +builds will be fully reproducible as long as the cache key doesn't change, +with suitable module build systems that don't embed timestamps, for example. + +A ``strong`` cache key includes the strong cache key of each build dependency +(and their runtime dependencies) of the element as changes in build dependencies +(or their runtime dependencies) can result in build differences in reverse +dependencies. This means that whenever the strong cache key of a dependency +changes, the strong cache key of its reverse dependencies will change as well. + +A ``weak`` cache key has an almost identical structure, however, it includes +only the names of build dependencies, not their cache keys or their runtime +dependencies. A weak cache key will thus still change when the element itself +or the environment changes but it will not change when a dependency is updated. + +For elements without build dependencies the ``strong`` cache key is identical +to the ``weak`` cache key. + +Strict build plan +----------------- +This is the default build plan that exclusively uses ``strong`` cache keys +for the core functionality. An element's cache key can be calculated when +the cache keys of the element's build dependencies (and their runtime +dependencies) have been calculated and either tracking is not enabled or it +has already completed for this element, i.e., the ``ref`` is available. +This means that with tracking disabled the cache keys of all elements could be +calculated right at the start of a build session. + +While BuildStream only uses ``strong`` cache keys with the strict build plan +for the actual staging and build process, it will still calculate ``weak`` +cache keys for each element. This allows BuildStream to store the artifact +in the cache with both keys, reducing rebuilds when switching between strict +and non-strict build plans. If the artifact cache already contains an +artifact with the same ``weak`` cache key, it's replaced. Thus, non-strict +builds always use the latest artifact available for a given ``weak`` cache key. + +Non-strict build plan +--------------------- +The non-strict build plan disables the time-consuming automatic rebuild of +reverse dependencies at the cost of dropping the reproducibility benefits. +It uses the ``weak`` cache keys for the core staging and build process. +I.e., if an artifact is available with the calculated ``weak`` cache key, +it will be reused for staging instead of being rebuilt. ``weak`` cache keys +can be calculated early in the build session. After tracking, similar to +when ``strong`` cache keys can be calculated with a strict build plan. + +Similar to how strict build plans also calculate ``weak`` cache keys, non-strict +build plans also calculate ``strong`` cache keys. However, this is slightly +more complex. To calculate the ``strong`` cache key of an element, BuildStream +requires the ``strong`` cache keys of the build dependencies (and their runtime +dependencies). + +The build dependencies of an element may have been updated since the artifact +was built. With the non-strict build plan the artifact will still be reused. +However, this means that we cannot use a ``strong`` cache key calculated purely +based on the element definitions. We need a cache key that matches the +environment at the time the artifact was built, not the current definitions. + +The only way to get the correct ``strong`` cache key is by retrieving it from +the metadata stored in the artifact. As artifacts may need to be pulled from a +remote artifact cache, the ``strong`` cache key is not readily available early +in the build session. However, it can always be retrieved when an element is +about to be built, as the dependencies are guaranteed to be in the local +artifact cache at that point. + +``Element._get_cache_key_from_artifact()`` extracts the ``strong`` cache key +from an artifact in the local cache. ``Element._get_cache_key_for_build()`` +calculates the ``strong`` cache key that is used for a particular build job. +This is used for the embedded metadata and also as key to store the artifact in +the cache. |