Develocity Flaky Test Detection Guide

Flaky, or non-deterministic, tests are a serious and prevalent problem in modern software development. An unreliable test suite with flaky tests wastes developers' time by triggering unnecessary test failure investigations that are not the result of their code changes, and delaying the integration of their code.

Often tests which report flaky results are not themselves unreliable, but caused by flawed production code or test infrastructure. Thus, it is important to periodically identify and fix the most severe flaky tests.

Develocity provides Test Failure Analytics which gives you tools for quicker root cause analysis.

How flaky test detection works

Develocity identifies flaky tests in a single build and across multiple builds. For a single build, a test outcome is marked as FLAKY if it fails and succeeds within the execution of a single Gradle task, Maven goal, Bazel target, or sbt task. When this occurs, flaky tests analysis becomes available in Build Scans and in the Develocity Tests Dashboard.

This typically requires retrying failed tests, which is an industry-standard way to identify flaky tests. Develocity also detects flaky tests if no retry mechanism is set up or if the flaky test did not succeed within the configured retry limit. Tests that were executed and had different outcomes for the same task/goal inputs will be marked as cross-build flaky.

Flaky test detection setup

Common test execution frameworks such as JUnit provide mechanisms for retrying tests, typically requiring extra code to annotate tests that are known to be flaky.

However, enabling test retry via your build does not require source code changes and applies to your entire test suite. Importantly, this allows you to analyze newly-introduced flaky tests in the Develocity Tests Dashboard.

One other important aspect to consider is whether to fail the build when flaky tests are encountered. Historically, retry mechanisms have allowed builds to succeed. When enabling test retry through Gradle or sbt, it is possible to enable flaky test detection without silencing flaky failures. This comes at the cost of continuing developer disruptions, however; and should be considered carefully.

Gradle

The configuration examples in this section assume you are using Develocity Gradle plugin 3.17 or later. For older versions, please refer to the (Legacy) Gradle Enterprise Gradle Plugin User Manual.

The Develocity Gradle plugin version 3.12 or above integrates the test retry functionality offered by the Test Retry Gradle plugin.

build.gradle.kts

tasks.withType<Test>().configureEach {
    develocity.testRetry {
        if (System.getenv().containsKey("CI")) {
            maxRetries.set(3)
            failOnPassedAfterRetry.set(true)
        }
    }
}

build.gradle

tasks.named('test', Test) {
    develocity.testRetry {
        if (System.getenv().containsKey("CI")) {
            maxRetries = 3
            failOnPassedAfterRetry = true
        }
    }
}

See test retry functionality in Develocity documentation to learn about all the useful features and configuration options.

When using an older version of the Develocity Gradle plugin, you can use the test retry functionality of the test retry plugin version 1.1.4 or above.

build.gradle.kts

plugins {
    id("org.gradle.test-retry") version "1.5.10" (1)
}

tasks.withType<Test>().configureEach {
    retry {
        if (System.getenv().containsKey("CI")) {
            maxRetries.set(3)
            failOnPassedAfterRetry.set(true)
        }
    }
}

build.gradle

plugins {
    id('org.gradle.test-retry') version '1.5.10' (1)
}

tasks.named('test', Test) {
    retry {
        if (System.getenv().containsKey("CI")) {
            maxRetries = 3
            failOnPassedAfterRetry = true
        }
    }
}

See the Test Retry Gradle plugin documentation and introductory blog post to learn about all the useful features and configuration options.

Maven

The Maven Surefire and Failsafe plugins provide configuration properties which cause the test runner to retry each failing test a configured number of times.

Configuring these properties in your project causes failing tests to be rerun immediately after they fail. If a test passes and then fails, Develocity will record a FLAKY outcome for the test.

pom.xml

<properties>
    <failsafe.rerunFailingTestsCount>2</failsafe.rerunFailingTestsCount>
    <surefire.rerunFailingTestsCount>2</surefire.rerunFailingTestsCount>
</properties>

Test retry works the same way when using Test Distribution with the Develocity Maven Extension.

See the maven-surefire-plugin documentation and maven-failsafe-plugin documentation for compatibility and configuration details.

Bazel

Bazel provides a common flaky attribute to test rules which causes Bazel to rerun failing tests up to three times, equivalent to specifying --flaky_test_attempts=3 for test runs.

If any subsequent test execution passes after a failure, the test is marked as FLAKY, and the test target may succeed.

BUILD

java_test(
    name = "foo",
    flaky = True
)

See the Bazel user manual for more information.

sbt

The Develocity sbt plugin version 1.0 or above supports test retry.

build.sbt

ThisBuild / develocityConfiguration :=
  DevelocityConfiguration(
    testRetryConfiguration = TestRetryConfiguration()
      .withMaxRetries(if (sys.env.contains("CI")) 3 else 0)
      .withFlakyTestPolicy(FlakyTestPolicy.Fail)
  )

See the section Using Test Retry in the Develocity sbt plugin user manual to learn about all the useful features and configuration options.

Resources for flaky test analysis

With flaky test detection enabled, you will be able to identify the most severe flaky tests and their trends using Develocity Test Failure Analytics.

Here are some resources which show you how to best leverage these tools:

Preventing Flaky Tests from Ruining your Test Suite — More on flaky test detection, common sources of flakiness, flaky test management strategies
Flaky Test Management with Develocity — A real-world example debugging a flaky test in Gradle using Develocity
"Flaky Test Days" primer — How Gradle engineering uses "Flaky Test Days" to manage flaky tests
Develocity API samples — Example how to automate flaky test management using Develocity API