Unit testing vs. integration testing

Unit tests and integration tests are often compared – and usually presented as opposites. In this post, I will examine these two types of tests from a different point of view and show how this perspective can help us write better tests.

A plea for a differentiated view

Unit tests are usually characterized as “small”, “fast”, and “reliable”, while integration tests are typically seen as “large”, “slow”, and “unreliable”. If you compare the test for one method of one class that doesn't do much with the test of the complete system including an external database, this is probably true. In the former case, most developers would probably agree that it is a unit test, and in the latter case, that it is an integration test. This is certainly not wrong. However, it's also not the whole truth. Let's take a look at a test for a Java class that solves the universally beloved FizzBuzz problem.

(All the code from this post is available in the GitHub repo fizzbuzz-testing-example to play around with.)

public class FizzBuzz {
    public List<String> go(int n) {
        return IntStream
            .rangeClosed(1, n)
            .mapToObj(this::fizzBuzz)
            .toList();
    }

    private String fizzBuzz(int n) {
        if (n % 3 == 0 && n % 5 == 0) return "FizzBuzz";
        else if (n % 3 == 0) return "Fizz";
        else if (n % 5 == 0) return "Buzz";
        else return Integer.toString(n);
    }
}

public class Main {
    public static void main(String[] args) {
        var fizzBuzz = new FizzBuzz();
        fizzBuzz.go(100).forEach(System.out::println);
    }
}

class FizzBuzzTest {
    @Test
    void oneIs1() {
        FizzBuzz fizzBuzz = new FizzBuzz();

        var result = fizzBuzz.go(1);

        assertEquals("1", result.getFirst());
    }

    // + several similar test cases
}

Probably everyone would agree that this is a unit test. It instantiates the class, calls a method that doesn't interact with other components, and checks the result, nothing more … but is that really true? What about IntStream? What about String and Integer? What about the Java compiler? And the Java runtime? All of this is necessary to actually execute the method FizzBuzz.go(int). If we leave out any of this, the program is no longer executable. A bug in any of these components could cause the test to fail. The test only runs successfully if all of these components work without error.

Let that sink in for a moment.

“But those things aren't mine! If my test depends on 3rd-party stuff, does that mean this is an integration test!?”

Yes, that's exactly what it means. More precisely: every test is an integration test, because every test integrates something; the only question is what. Thus, every unit test is also an integration test. It integrates everything that is part of the unit and leaves out everything that is not part of the unit.

In the example above, the test integrates the external dependencies I already mentioned, but also the methods FizzBuzz.go(int) and FizzBuzz.fizzBuzz(int). And the methods of the standard library, too.

So the interesting question is …

What is a unit?

For our purposes, a unit is a self-contained functional element with defined dependencies and inputs and outputs. Units can work together as part of a larger unit.

We'll expand the FizzBuzz example a bit to highlight some possibilities:

public interface Output {
    String render();
}

public record NumberOutput(int number) implements Output {
    @Override
    public String render() { return Integer.toString(number); }
}

public enum FizzBuzzOutput implements Output {
    FIZZ("Fizz"),
    BUZZ("Buzz"),
    FIZZBUZZ("FizzBuzz");

    private final String rep;

    FizzBuzzOutput(String rep) { this.rep = rep; }

    @Override
    public String render() { return rep; }
}

public interface Selector {
    Output select(int n);
}

public class Does {
    public boolean divide(int divisor, int n) { return (n%divisor==0); }
}

public record SimpleSelector(Does does) implements Selector {
    @Override
    public Output select(int n) {
        if (does.divide(3, n) && does.divide(5, n)) return FIZZBUZZ;
        else if (does.divide(3, n)) return FIZZ;
        else if (does.divide(5, n)) return BUZZ;
        else return new NumberOutput(n);
    }
}

public interface Streamer {
    Stream<Output> go(int n);
}

public record AscendingStreamer(Selector selector) implements Streamer {
    @Override
    public Stream<Output> go(int n) {
        return IntStream.rangeClosed(1, n).mapToObj(selector::select);
    }
}

public record FizzBuzz(Streamer streamer) {
    public List<String> go(int n) {
        return streamer.go(n).map((output) -> {
            if (output == FIZZBUZZ) return output.render().toUpperCase();
            else return output.render();
        }).toList();
    }
}

public class Main {
    public static void main(String[] args) {
        var fizzBuzz =
            new FizzBuzz(
                new AscendingStreamer(new SimpleSelector(new Does())));
        fizzBuzz.go(100).forEach(System.out::println);
    }
}

The new solution divides the code into several classes with distinct responsibilities. The class FizzBuzz now uses a Streamer that determines what number to start with and in what order to process the numbers. Streamer in turn uses a Selector to determine which number should actually be output. Selector in turn builds on the helper class Does[1], which handles the divisibility check. The outputs are returned in typed form. FizzBuzz itself now only needs to convert the Stream of Outputs into a list of Strings to keep the interface unchanged.[2]

However, we've built in one extension: we want to SHOUT “FizzBuzz” in capital letters and introduced an additional mapping in the FizzBuzz class for this purpose.

To plug the individual parts together, we use constructor-based dependency injection. The use of interfaces allows us to easily swap out implementations. It should be noted, however, that this separation of concerns is an internal implementation detail of the FizzBuzz class and has no impact on the interface it exposes to callers or its functionality. Except for the initialization code, the main method looks exactly the same as before.

For the purpose of testing this implementation, what are the relevant units here? The reflexive answer would probably be, “Each class is a unit.” For unit testing, this means we test each class individually and leave its dependencies out of the test. We use Mockito for this and also AssertJ, to simplify list and stream handling in the tests. Here are a few example tests:

@ExtendWith(MockitoExtension.class)
class SimpleSelectorTest {
    @Mock
    private Does does;

    private SimpleSelector selector;

    @BeforeEach
    void setUp() {
        selector = new SimpleSelector(does);
    }

    @Test
    void modNothingYieldsNumber() {
        when(does.divide(3, 1)).thenReturn(false);
        when(does.divide(5, 1)).thenReturn(false);
        assertThat(selector.select(1)).isEqualTo(new NumberOutput(1));
    }

    // …

    @Test
    void mod3AndMod5YieldsFizzBuzz() {
        when(does.divide(3, 1)).thenReturn(true);
        when(does.divide(5, 1)).thenReturn(true);
        assertThat(selector.select(1)).isEqualTo(FIZZBUZZ);
    }
}

This test checks the selection logic. It does so more directly than the test in the previous version because it no longer has to extract the result from a list. The class Does is mocked away. There are separate tests for that one.

@ExtendWith(MockitoExtension.class)
class AscendingStreamerTest {
    @Mock
    private Selector selector;

    private Streamer streamer;

    @BeforeEach
    void setUp() {
        streamer = new AscendingStreamer(selector);
    }

    @Test
    void returnsSelectedOutputsInOrder() {
        when(selector.select(1)).thenReturn(new NumberOutput(1));
        when(selector.select(2)).thenReturn(new NumberOutput(2));
        when(selector.select(3)).thenReturn(FIZZ);

        var stream = streamer.go(3);

        assertThat(stream).containsExactly(
            new NumberOutput(1),
            new NumberOutput(2),
            FIZZ);
    }
}

This test only checks the behavior of our Streamer implementation. The Selector is mocked away.

And last but not least, the test of FizzBuzz checks the behavior of only this class, with the output of the Streamer specified by Mockito:

@ExtendWith(MockitoExtension.class)
class FizzBuzzTest {
    @Mock
    private Streamer streamer;

    private FizzBuzz fizzBuzz;

    @BeforeEach
    void setUp() {
        fizzBuzz = new FizzBuzz(streamer);
    }

    @Test
    void aggregates() {
        when(streamer.go(1)).thenReturn(Stream.of(
            new NumberOutput(1),
            new NumberOutput(2),
            FIZZ));
        assertThat(fizzBuzz.go(1)).containsExactly("1", "2", "Fizz");
    }

    @Test
    void shoutsFIZZBUZZ() {
        when(streamer.go(1)).thenReturn(Stream.of(FIZZBUZZ));
        assertThat(fizzBuzz.go(1)).containsExactly("FIZZBUZZ");
    }
}

The tests all run successfully, and when we execute Main.main(), we see that everything also works together.

The approach “unit=class” (or sometimes also “unit=method”) is widespread among Java developers. I've seen this kind of test (at the class level with mocked dependencies) many times in different projects. And dependency chains are quite common. An example in a microservice based on the ports-and-adapters architecture could be: Web Controller → Driving Port → Application Service → other Application Service → Domain Object → another Domain Object → Driven Port → JDBC Adapter (arrows are runtime dependencies).

Of course, FizzBuzz in its entirety is also a unit. In the first implementation, all the code is in one class; in the second it is distributed across multiple classes, but it still forms a functional unit. In this example, the unit as a whole is quite small and can well be tested in its entirety, but in larger code bases such whole-system tests can be quite slow (looking at you, @SpringBootTest). This is the point at which most developers begin to bandy about the word “integration test”, and negative vibes can be felt.

So, mocking it is. But what happens to tests in this style if we begin to make changes?

A small change

Let's assume that the customer for our industry-leading FizzBuzz solution wants to pave the way for a bright future in which FizzBuzz can not only “fizz” and “buzz”, but also “zoom” and “boom” and much more – and not just for the numbers 3 and 5, but for any other number. To achieve this, we will probably need to combine multiple numbers, or rather their mapped Outputs. This is where our enum-based approach will hit its limits. We would have to add not only ZOOM and BOOM as values, but also all possible combinations. To avoid combinatorial explosion, we introduce a CombinedOutput instead:

public record CombinedOutput(List<Output> outputs) implements Output {
    public CombinedOutput(Output... outputs) { this(List.of(outputs)); }

    @Override
    public String render() {
        return outputs
            .stream()
            .map(Output::render)
            .collect(Collectors.joining(""));
    }
}

And use this in our SimpleSelector:

<         if (does.divide(3, n) && does.divide(5, n)) return FIZZBUZZ;
---
>         if (does.divide(3, n) && does.divide(5, n)) return new CombinedOutput(FIZZ, BUZZ);

And in its test:

<         assertThat(selector.select(1)).isEqualTo(FIZZBUZZ);
---
>         assertThat(selector.select(1)).isEqualTo(new CombinedOutput(FIZZ, BUZZ));

We run the entire test suite and see that all tests pass. Great! That would be that. We're now well equipped for the introduction of ZOOM at 7 or whatever else may come our way. Quickly push to production and the sprint goal is achieved. Or the deliverable for this release can be signed off. Or something like that. In any case, we're done.


PAUSE

Before reading on, take a moment to think about what just broke! Because, yes, something did break. But I didn't lie. All tests ran without errors.


Found the mistake? No? Well, then, let's take a look. What happens when we run Main.main()?

1
2
Fizz

Looking good so far. It's fizzing.

4
Buzz

Buzzing, too.

Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz

And here's the “FizzBuzz”. But – wait a minute! Shouldn't it be “FIZZBUZZ”? You remember the SHOUTING requirement? And I know for sure there's a dedicated test for it. Where was it again? Ah, in FizzBuzzTest. Let's take a closer look at it now.

    @Test
    void shoutsFIZZBUZZ() {
        when(streamer.go(1)).thenReturn(Stream.of(FIZZBUZZ));
        assertThat(fizzBuzz.go(1)).containsExactly("FIZZBUZZ");
    }

If the streamer returns FIZZBUZZ, “FIZZBUZZ” should come out. So what's the problem? The problem is that in production the streamer no longer returns FIZZBUZZ at all, because we changed the selector to return a CombinedOutput. If the selector were still outputting FIZZBUZZ, the class FizzBuzz would still SHOUT. But it no longer does.

Because we focused each test so narrowly on its respective unit – or what we identified as a unit, i.e., the class – none of them was able to uncover this mismatch. A comprehensive integration test would have helped, but those are expensive and slow and are therefore only executed relatively late in the development cycle – or omitted entirely if the test coverage is already good. And it is good (100%).[3]

A real example

Of course, this is a contrived example. But the same kinds of errors also occur in real development projects. A case I've personally encountered concerned a duplicate check in a REST interface. There was a test for the database adapter that ensured a DuplicatePaymentException was thrown when a unique constraint was violated during insertion into the database. And there was a test at the API layer that ensured that in case of a DuplicatePaymentTransactionException an appropriate response (HTTP status code 409) was sent to the client. This project relied heavily on mocking to isolate the “units” in tests. Each of the tests looked plausible when viewed individually, and they were all successful. But clients in production kept getting HTTP status code 500 (which indicates an internal server error) instead of the expected duplicate error message, because the DuplicatePaymentException was not caught and handled. I only found the cause when I looked at both tests side by side. That's when I noticed that different (but similarly named) exception types were being used.

The problem of too narrow unit tests

The fundamental problem here is the same as in the FizzBuzz example: the test of one unit makes assumptions about the behavior of another unit and hardcodes them into the test instead of verifying them as well. If the streamer returns FIZZBUZZ, then FizzBuzz does the right thing. But if it doesn't, the test is meaningless. If the adapter throws a DuplicatePaymentTransactionException, the API delivers the correct response. But if it throws a DuplicatePaymentException instead, we lose.

In both cases, we tried to make things independent of each other that are not independent of each other in reality. But if the behavior of one of our unit's dependencies changes, it would certainly be good to have a test that fails if the change invalidates our assumptions.

So back to giant integration tests after all? Not necessarily. Instead, I propose overlapping tests.

Overlapping tests

Such tests are unit tests, too. We just choose the unit a bit differently. So far, we have used exactly one class as a unit, with boundaries upwards (to the caller) and downwards (to the dependencies). If we draw a picture of the scopes of these tests, they are completely separated from each other. In the following diagram, each test frames the classes it covers, i.e., exactly one each. The arrows between the classes represent runtime dependencies.

The diagram shows the classes FizzBuzz, AscendingStreamer, SimpleSelector and Does as well as the corresponding test classes. FizzBuzzTest frames FizzBuzz. AscendingStreamerTest frames AscendingStreamer. SimpleSelectorTest frames SimpleSelector and DoesTest frames Does. The dependency arrow from FizzBuzz points to AscendingStreamer, the one from AscendingStreamer to SimpleSelector and the one from SimpleSelector to Does.

As we have seen, the problem with this strict separation is that the interactions remains untested. Overlapping tests solve this by expanding the scope of the tests (i.e., enlarging the unit under consideration), so that the interactions between the classes are included in the tests. Figuratively speaking, each arrow is contained in at least one test scope. For our project “Over-engineered FizzBuzz”, it could look like this:

The diagram shows the classes FizzBuzz, AscendingStreamer, SimpleSelector and Does as well as the corresponding test classes. FizzBuzzTest frames FizzBuzz, AscendingStreamer and SimpleSelector. AscendingStreamerTest frames AscendingStreamer, SimpleSelector and Does. SimpleSelectorTest frames SimpleSelector and Does, and DoesTest frames only Does. The dependency arrows from FizzBuzz to AscendingStreamer, from AscendingStreamer to SimpleSelector, and from SimpleSelector to Does now lie within the test frames.

And here is the code:

class SimpleSelectorTest {
    private final SimpleSelector selector =
        new SimpleSelector(new Does());

    @Test
    void oneIs1() {
        assertThat(selector.select(1)).isEqualTo(new NumberOutput(1));
    }

    // …

    @Test
    void fifteenIsFizzBuzz() {
        assertThat(selector.select(15))
            .isEqualTo(new CombinedOutput(FIZZ, BUZZ));
    }
}

Here, Does is no longer mocked away, which even makes the test code shorter.[4]

class AscendingStreamerTest {
    @Test
    void returnsSelectedOutputsInOrder() {
        var streamer =
            new AscendingStreamer(new SimpleSelector(new Does()));

        var stream = streamer.go(15);

        assertThat(stream).containsExactly(
            n(1), n(2), FIZZ, n(4), BUZZ, FIZZ, n(7), n(8), FIZZ, BUZZ,
            n(11), FIZZ, n(13), n(14), FIZZBUZZ
        );
    }

    private NumberOutput n(int n) { return new NumberOutput(n); }
}

Here, the selector is no longer mocked (and neither is Does – we could mock it, but we wouldn't gain any advantage by doing so). The test must cover all possible outputs that are important to us, which is why it goes up to 15. That's where FIZZBUZZ appears for the first time. This ensures that the streamer works for all Outputs that SimpleSelector returns. In the “mocking” test implementation, this was pointless, since the test itself specified what the streamer returned. It didn't contain the arrow, so to speak. This test does.

@ExtendWith(MockitoExtension.class)
class FizzBuzzTest {
    @Mock
    private Does does;

    private FizzBuzz fizzBuzz;

    @BeforeEach
    void setUp() {
        fizzBuzz =
            new FizzBuzz(new AscendingStreamer(new SimpleSelector(does)));
    }

    @Test
    void aggregates() {
        when(does.divide(3, 1)).thenReturn(false);
        when(does.divide(5, 1)).thenReturn(false);
        when(does.divide(3, 2)).thenReturn(false);
        when(does.divide(5, 2)).thenReturn(false);
        when(does.divide(3, 3)).thenReturn(true);
        when(does.divide(5, 3)).thenReturn(false);

        assertThat(fizzBuzz.go(3)).containsExactly("1", "2", "Fizz");
    }

    @Test
    void shoutsFIZZBUZZ() {
        when(does.divide(3, 1)).thenReturn(true);
        when(does.divide(5, 1)).thenReturn(true);

        assertThat(fizzBuzz.go(1)).containsExactly("FIZZBUZZ");
    }
}

In this test, Does is mocked, so the test doesn't reach all the way to the end of the dependency chain (otherwise it would be a complete integration test). However, it must reach far enough to cover the interesting behavioral differences in the dependencies. In our case, this means the selector must be included, since it determines which Outputs can arrive in FizzBuzz. This way, we get a test that fails when we change the selector result from FizzBuzzOutput.FIZZBUZZ to CombinedOutput and thereby break our SHOUTING feature.

Exactly how to draw the scope boundaries for the tests to find a good middle ground between sharply delineated (too narrow) tests and all-encompassing integration tests is a matter of experience. However, there are a few useful heuristics.

A dependency with a stable interface can be mocked without giving up too much safety. The interface of Does is very stable since it is based on mathematical rules, and can therefore be mocked without running the risk of being surprised by changes. In this concrete example, this isn't actually all that useful, since the implementation is fast enough and we don't really gain anything through mocking. But if we imagine a web interface that's slow and a bit unreliable, but has a stable interface contract, it makes much more sense.

On the other hand, mocking a dependency with interesting behavior should not be taken lightly. In the case of FizzBuzz, the Selector implementation contains the most interesting behavior (selecting what to output). As you can see in the picture above, we could mock it in FizzBuzzTest and still have all arrows covered by at least one test. We do not, however, because its behavior arguably has a larger influence on the operation of FizzBuzz than, e.g., that of the Streamer implementation. By mocking the Selector implementation, we would run the risk of missing interesting interactions. Replacing FizzBuzzOutput.FIZZBUZZ with CombinedOutput in the Selector would still cause AscendingStreamerTest to fail, so we would notice that this change has some knock-on effects. But we would not directly see the most interesting one.

A larger example

The principle of overlapping tests can be applied not only to unit tests within a single piece of software, but also to integration tests between multiple services. An example of its successful application was an integration of multiple services with payment service providers via a dedicated payment component.

Each service had tests for itself, of course, and the payment component also had tests for itself. Large integration tests, where a service was integrated with the payment component and the payment service providers, were difficult because the connection to the payment service providers' test environments wasn't particularly stable and therefore such tests were often disrupted by temporary problems. That's why we, as developers of the payment component, agreed with the service teams that they would use fake payment methods for the majority of their tests. These fake payment methods were not hooked up to the payment service providers. This way, service tests could proceed without disruption. Since the test scopes overlapped, as shown in the following diagram, it was still ensured that the system as a whole worked as intended.

In this diagram, tests shown in orange were the responsibility of the payment component team, and tests shown in blue were the responsibility of the service teams.[5]

The diagram shows the overlapping test scopes of the payment component. Each individual module of the payment component has its own unit tests. Internal end-to-end tests cover the interaction of the internal building blocks with each other and with the database. Integration tests performed by the payment component team cover the entire payment component and the connections to the payment service providers, which unfortunately only work semi-reliably in the test environment. Connected services also have tests that are performed by the respective service teams. In these, however, the payment service providers and their adapter implementations within the payment component are disconnected and replaced by a fake payment method implementation. The diagram shows that even without an all-encompassing integration test, each dependency arrow between components is covered by at least one test.

In addition to the tests shown in the diagram, there were further overlapping tests within the payment component. And each building block in the diagram also consisted of multiple units, which were tested partly with sharp boundaries and partly overlapping. Overall, the payment component achieved very good test coverage this way – much better than could be achieved with completely isolated unit tests, even supplemented with “large” integration tests.

Conclusion

Unit tests and integration tests cannot be sharply separated from each other – and are certainly not mutually exclusive concepts. Units can be of different sizes and contain sub-units. On the one hand, a unit is distinct and separated from other units (at the same level of granularity), and on the other hand, it integrates its own sub-units. Considering only units at one level (e.g., only unit=class) and testing them in isolation leads to test gaps and consequently to bugs.

Overlapping tests close these test gaps and at the same time help to avoid elaborate, slow and expensive all-encompassing integration tests.

Happy testing!


Footnotes

  1. The names Does and divide may seem a bit strange, but they enable the client code to say if (does.divide(3, n)). This is a trade-off I'm willing to make. Anyway, it doesn't have a bearing on the point of this post. ↩︎

  2. I use record instead of class for SimpleSelector, AscendingStreamer and FizzBuzz for syntactic brevity only; not because I think these classes make good records. ↩︎

  3. Yes, the error would have been noticed if we had immediately deleted the superfluous enum FIZZBUZZ. Then FizzBuzzTest would no longer have compiled. But in large projects changes often aren't rolled out atomically across the entire code base, and it is common for old classes or values to stay around for a while. ↩︎

  4. The test of Does itself remains the same. It doesn't have any dependencies anyway. ↩︎

  5. In the diagram, the payment service providers lie only partially in the test scope, since their test environments come with various limitations that make it impossible to cover 100% of the functionality in integration tests. Yes, this caused problems. Yes, we also tested in production. ↩︎