Using the Build Cache for sbt | Develocity Documentation

The Develocity Build Cache follows a simple principle: the best way to do work faster is to avoid doing it at all. While sbt offers some APIs to implement incremental builds (besides incremental compilation), previous outputs are not shared between builds and machines. The Develocity Build Cache lifts this limitation and allows you to reuse outputs of previous builds that may have been executed on different machines. Thus, it avoids executing costly tasks and accelerates your sbt builds significantly.

In the diagram below, you can see the flow of CI agents pushing to the remote cache, and developers pulling from the remote cache.

A task is executed on a CI server. The build is configured to push to a configured remote Build Cache, so that outputs can be reused by other CI pipeline builds and developer builds.
A developer executes the same task with a local change to a file. The Develocity sbt plugin tries to load the output from the local Build Cache, then the remote Build Cache. Neither contains a matching entry due to the local change, so the task is executed. The output is stored in the local Build Cache. Outputs stored in the local Build Cache can be reused in subsequent builds on that developer’s machine.
A second developer executes that task without any local changes from the commit that CI built. This time the remote Build Cache lookup is a hit, the cached output is downloaded and directly copied to the workspace and the local Build Cache. The task does not need to be executed.

This guide will show you how to get started with the sbt Build Cache provided by the Develocity sbt Plugin. The intended audience is build engineers who are looking to enable it for their existing builds. After you’ve seen the Build Cache in action, this guide will explain the basic concepts that are important to understand how the Build Cache works. You’ll learn how to measure the effectiveness of the Build Cache for your build and how to diagnose and solve common problems. Last but not least, this guide outlines how to roll out the Build Cache in your organization.

Getting started

The configuration examples in this document assume you are using Develocity sbt plugin 1.1 or later.

In order to enable build caching for your sbt project, you need to add the Develocity sbt plugin to your build. For this purpose, create project/plugins.sbt with the following content in the project root directory:

project/plugins.sbt

addSbtPlugin("com.gradle" % "sbt-develocity" % "1.1.2")

In addition, you need to configure the Develocity server in build.sbt.

build.sbt

ThisBuild / develocityConfiguration ~= { previous =>
  previous
    .withServer(
      previous.server
        .withUrl(url("https://develocity.mycompany.com"))
    )
}

Once you’ve done that, you’re ready to run your first sbt build that uses the Build Cache.

$ sbt clean compile
...
[info] compiling 1 Scala source to /Users/john/workspace/core/target/scala-2.12/classes ...
[info] compiling 1 Scala source to /Users/john/workspace/example/target/scala-2.12/classes ...
[success] Total time: 3 s, completed 4 Jul 2024, 14:27:04

Be sure to run the clean task first because otherwise the plugins disables storing task outputs in the Build Cache. This is done to avoid accidentally adding obsolete files that existed prior to invoking the build to the cached outputs of a task.

As you can see from the sbt logs, all compile tasks were executed and none were loaded from cache. That’s not surprising since the build started with an empty cache.

Important concepts

In order to get the most out of the Build Cache, it is important to understand the basic concepts of how it works.

Inputs and outputs

The outputs of a task are the results it produces when executed. Its inputs are the results of the tasks and settings it depends on. For example, for the compile task of sbt, all Scala and Java source files are inputs as well as all configuration options (such as compiler flags) that influence the resulting class files, i.e. its outputs.

Cache key

Artifacts in the Build Cache are uniquely identified by a Build Cache key. A Build Cache key is assigned to each cacheable task execution when running with the Build Cache enabled and is used for both loading and storing outputs of task executions to the Build Cache. The inputs that contribute to the Build Cache key of a task execution are the outputs of the tasks and settings that the task depends on. Two task executions can reuse their outputs by using the Build Cache if their associated Build Cache keys are the same.

Reproducible outputs

A task execution is said to have reproducible outputs if it will always generate the same outputs given the same inputs. Some tasks add extra information to their output that doesn’t depend on their inputs, e.g. a code generator might add a timestamp to the generated files. In such a case, re-executing the task will result in different outputs. Consequently, tasks that use these outputs as their inputs will need to be re-executed.

When a task is cacheable the very nature of task output caching ensures that its executions will have the same outputs for a given set of inputs. Therefore, cacheable tasks should have reproducible outputs. Otherwise, the result of executing the task and loading its outputs from cache may be different, which can lead to hard-to-diagnose cache misses.

Stable inputs

The outputs of a task can only be loaded from cache if it has stable inputs. Unstable inputs result in frequent, unnecessary cache misses. For example, compiling tests depends on the result of compiling production code. Thus, in order for tasks to have stable inputs, the tasks they depend on should have reproducible outputs.

While we acknowledge that creating outputs that contain volatility (such as build timestamps) is a common practice for sbt builds, we see this as an antipattern because they drastically reduce the probability of cache hits. If, for example, one project in a multi-project build generates a build timestamp, the Build Cache has to assume that this timestamp is used by downstream projects. Therefore, all of them have to be rebuilt even if the timestamp is not actually used.

Timestamps in particular are of dubious significance. What does a timestamp tell us about the origin of the artifact? Does it help us to track it back to the CI job that created it? The answer is almost always "no", because the time the CI job was started is usually not identical with the timestamp the artifact was generated. Instead, you should use something that uniquely identifies the code that was used to produce the artifact. A good candidate for this is the SCM revision number or commit ID.

Additional inputs

The Develocity sbt plugin tracks all known inputs of the supported tasks. However, sometimes, a task may read additional inputs. For example, your tests might read external files that aren’t in the classpath. To make caching work correctly, you need to specify these additional inputs.

Measuring cache effectiveness

Now that we understand the most important concepts of build caching, let us walk through an example of how to measure and improve cache effectiveness.

Rebuilding when nothing has changed

Let’s start off by running the build for the first time:

$ sbt clean compile
...
[info] compiling 1 Scala source to /Users/john/workspace/core/target/scala-2.12/classes ...
[info] compiling 1 Scala source to /Users/john/workspace/example/target/scala-2.12/classes ...
[success] Total time: 3 s, completed 4 Jul 2024, 14:27:04

When a task is executed, the Develocity sbt plugin first checks the local Build Cache for stored build results that may be reused. If no result is found in the local Build Cache, the remote Build Cache is queried. If neither provides a result, the task is executed and the output is stored in the local Build Cache. Since we just activated the Build Cache for the project, the local Build Cache as well as the remote Build Cache are empty and all tasks are executed.

When we run the build a second time with a populated local cache, the build results of cacheable tasks should be retrieved from the cache.

When we run the build another time, the build results of cacheable tasks should be retrieved from the cache:

$ sbt clean compile
[success] Total time: 1s, completed 4 Jul 2024, 14:28:30

This time, all results were loaded from the local Build Cache and no compilation task was executed.

Finding the cause of cache misses

In order to identify which inputs changed between builds, we can enable verbose build cache logging. This additional logging will make it easy to identify the changing inputs.

Enabling verbose build cache logging has an impact on build performance and produces a significant amount of additional log messages. For these reasons it is disabled by default.

To enable verbose build cache logging, set the system property: develocity.internal.cache.verbose=true.

To make the build cache log messages more visible, you can also set the system property develocity.internal.cache.defaultLogLevel to the log level you want to see, such as info for instance.

Combining the two system properties, the command to run the build with verbose build cache logging would look like this:

$ sbt -Ddevelocity.internal.cache.verbose=true -Ddevelocity.internal.cache.defaultLogLevel=info compile

With these additional system properties, Develocity will log the inputs of each task that attempts to retrieve its result from the Build Cache:

$ sbt -Ddevelocity.internal.cache.verbose=true -Ddevelocity.internal.cache.defaultLogLevel=info compile
[info] welcome to sbt 1.10.1 (BellSoft Java 21.0.1)
(...)
[info] Computing cache key 'proj / Compile / compile / develocityTaskCacheKey' (4 components):
[info] Hashing '. / thisProject' produced '641a7d11054d49795fac52da80768b9e'
[info] Hashing 'proj / Compile / unmanagedSources / inputFileStamps' produced 'f44bc74652ee7c2586fb01a3d6263d03'
[info] Hashing 'proj / Compile / develocityExternalDependencyClasspathStamps' produced 'bc764cd8ddf7a0cff126f51c16239658'
[info] Hashing 'Global / extraIncOptions' produced 'ae3b9c769f6389c0aaef766f48b3c4f9'
[info] Computed cache key 'proj / Compile / compile / develocityTaskCacheKey': '3f2c516d91677bbe810b373da00357b0'
(...)

By comparing the hashes between two executions, you should be able to identify which inputs changed between the two builds. Modifying these inputs so that they produce a stable hash will fix the cache misses.

Changed host

Last but not least, we need to ensure that the cache works even across machine boundaries. First, we’ll need to push some outputs to the remote cache, so we can use them from another machine:

$ sbt clean compile -Ddevelocity.cache.local.enabled=false -Ddevelocity.cache.remote.storeEnabled=true

Now we can log into another machine, maybe even one using another operating system, and check out the same commit of our project. Building it should retrieve all task outputs from the remote cache:

$ sbt clean compile -Ddevelocity.cache.local.enabled=false

If your project does not retrieve its outputs from the remote cache, follow the steps above to find the changing inputs.

Rolling out the cache in your organization

This chapter will show you how you can adjust the plugin’s settings to do a safe, staged roll-out of caching throughout your organization.

Enable the cache for a subset of your users

Once you’ve verified cache effectiveness on your own machine, you’ll probably want to allow a few other colleagues to try it out, without affecting everyone else on the team. You can do this by disabling the cache in the project’s build.sbt, unless the environment variable DEVELOCITY_CACHE_ENABLED is set.

build.sbt

ThisBuild / develocityConfiguration ~= { previous =>
  val cacheEnabled = sys.env.contains("DEVELOCITY_CACHE_ENABLED")

  previous
    .withBuildCache(
      previous.buildCache
        .withLocal(previous.buildCache.local.withEnabled(cacheEnabled))
        .withRemote(previous.buildCache.remote.withEnabled(cacheEnabled))
    )
}

and then letting your early adopters re-configure their shell in ~/.profile or ~/.zprofile:

~/.profile

export DEVELOCITY_CACHE_ENABLED=true

Make sure that your early adopters are seeing the same local Build Cache hit ratio that you had in your own experiments.

Enable the cache on CI

For your CI builds, changing settings in the user home is probably not an option, as that may affect other projects. Instead, you can modify the above snippet to enable the remote cache if the CI environment variable is set (or another environment variable that your CI system sets):

build.sbt

ThisBuild / develocityConfiguration ~= { previous =>
  val isCi = sys.env.contains("CI")
  val cacheEnabled = sys.env.contains("DEVELOCITY_CACHE_ENABLED") || isCi

  previous
    .withBuildCache(
      previous.buildCache
        .withLocal(previous.buildCache.local.withEnabled(cacheEnabled))
        .withRemote(previous.buildCache.remote.withEnabled(cacheEnabled))
    )
}

At first, you may want to do this for a dedicated test pipeline, until you have convinced yourself that caching works well enough to roll it out to your main pipeline.

You can also use this configuration file to enable storing in the remote Build Cache, so that later builds can benefit from the outputs that your CI agents created.

build.sbt

ThisBuild / develocityConfiguration ~= { previous =>
  val isCi = sys.env.contains("CI")
  val cacheEnabled = sys.env.contains("DEVELOCITY_CACHE_ENABLED") || isCi

  previous
    .withBuildCache(
      previous.buildCache
        .withLocal(
          previous.buildCache.local
            .withEnabled(cacheEnabled)
        )
        .withRemote(
          previous.buildCache.remote
            .withEnabled(cacheEnabled)
            .withStoreEnabled(isCi)
        )
    )
}

We strongly recommend letting local developers only load from the remote cache and letting your CI servers store results in the remote cache. For this reason, storing in the remote Build Cache is disabled by default and has to be explicitly enabled.

Your CI builds will now populate the remote Build Cache. Your local builds should now get cache hits whenever they execute a task that has already been executed on CI with the same inputs. Make sure this works well for all your developers.

Make the best use of the cache on CI

Many projects have a pipeline with multiple stages, with many steps running in parallel. In order to get the most out of the Build Cache, we recommend running

$ sbt clean 'Test / compile'

as your first pipeline stage, so that all subsequent stages can reuse the compiled production and test code.

If you are using ephemeral CI agents, the local Build Cache will not give you any benefit, since it disappears together with the build agent. You can disable it to save some build time in this case.

build.sbt

ThisBuild / develocityConfiguration ~= { previous =>
  val isCi = sys.env.contains("CI")
  val cacheEnabled = sys.env.contains("DEVELOCITY_CACHE_ENABLED") || isCi

  previous
    .withBuildCache(
      previous.buildCache
        .withLocal(
          previous.buildCache.local
            .withEnabled(!isCi && cacheEnabled)
        )
        .withRemote(
          previous.buildCache.remote
            .withEnabled(cacheEnabled)
            .withStoreEnabled(isCi)
        )
    )
}

Use multiple nodes to reduce latency

The effectiveness of using a remote Build Cache is largely dictated by the network latency between the build and the cache. Develocity provides a built-in cache node at https://develocity.mycompany.com/cache. This is the node where outputs will be stored and loaded by default. You can install additional nodes and connect them with Develocity. See the Build Cache Node User Manual for more details. Using a Build Cache node that is closer to where the builds are run can significantly reduce build times.

Each team member should configure the closest node in their shell ~/.profile or ~/.zprofile:

~/.profile

export DEVELOCITY_CACHE_URL=https://cache-eu.develocity.mycompany.com

And modify you build to use this value:

build.sbt

ThisBuild / develocityConfiguration ~= { previous =>
  val isCi = sys.env.contains("CI")
  val cacheEnabled = sys.env.contains("DEVELOCITY_CACHE_ENABLED") || isCi
  val cacheUrl = sys.env.get("DEVELOCITY_CACHE_URL").map(url)

  previous
    .withBuildCache(
      previous.buildCache
        .withLocal(
          previous.buildCache.local
            .withEnabled(cacheEnabled)
        )
        .withRemote(
          previous.buildCache.remote
            .withEnabled(cacheEnabled)
            .withStoreEnabled(isCi)
            .withServer(
              previous.buildCache.remote.server
                .withUrl(cacheUrl)
            )
        )
    )
}

Enable the cache for everyone

Once you have convinced yourself that caching is working well for both your CI and local builds, you can remove the disable-by-default configuration from your project, so the cache is used by everyone.

Summary

This guide has introduced the Develocity Build Cache for sbt and explained the underlying concepts. You should now have the knowledge to adapt your own build, so it can make effective use of the Build Cache. In addition, you have learned how to roll out the Build Cache in your organization. Please refer to plugin user manual for a reference of all available configuration options.

Be aware that your journey does not end here. As with any performance optimization, it’s an ongoing process, not an event. You should invest into keeping your build well-behaved and check regularly that you are still making effective use of the Build Cache. Build scans are an essential tool for keeping builds fast. You can learn more about them in the getting started guide.