What happened to the test automation pyramid?

In a February 2020 article on cucumber.io, “Eviscerating the Test Automation Pyramid”, Seb Rose points out some of the issues that ensue when people misunderstand the intent of the test automation pyramid (or triangle) originally proposed by Mike Cohn and interpret the model too literally. He suggests if we remove the various levels and labels that people have added on the inside of the triangle, and think about the shape as such, we’ll come closer to the original intent – many tests of small scope and fewer tests of larger scope.

Among other sources, he quotes Ham Vocke of ThoughtWorks, who has summarized the intent as:

Write tests with different granularity
The more high-level you get the fewer tests you should have

I think that’s a good summary. The shape is what’s important. Specifying some fixed number of levels with particular names isn’t really the point.

It has become popular to disparage the test automation pyramid. Unfortunately, the people who disparage it make the same sort of logical error as those who follow the model rigidly – they don’t quite understand the intent. As a result, they sometimes throw out the baby with the bathwater.

Recently I was working with a team to review a small code base they were modifying. It isn’t a very old codebase, but it already has some of the “traditional” issues.

One thing I noticed was a set of unit tests for code that doesn’t exhibit any “behavior” as such; it’s glue code for connecting the application to a framework.

The team had gone to considerable trouble to mock out dependencies so they could test this bit of code without starting a server. Whether this is worth the trouble really depends on circumstances, and is a judgment call. In this case, I didn’t see any value in the tests. It seemed to me they could have checked this at the integration test level, one step “higher” in the pyramid model. The setup for the unit tests was pretty complicated and non-obvious. I wondered how new team members would react to it in future.

They were using Kotlin with ktor, kodein, and kotlintest, and had Java compatibility set to 1.8. This struck me as a little odd, given it was fairly fresh code. I updated those things and replaced kotlintest with kotest. Running with Java 14 compatibility, we started to get this warning-level error in the build:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by io.kotlintest.extensions.system.SystemEnvironmentExtensionsKt (file:...) to field java.util.Collections$UnmodifiableMap.m
WARNING: Please consider reporting this to the maintainers of io.kotlintest.extensions.system.SystemEnvironmentExtensionsKt
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

It tied back to their setup for the unit tests in question. I learned it’s a change from Java 8 to Java 9 to deprecate a potentially-risky way of modifying private elements in the code; specifically, AccessibleObject.setAccessible(). It’s used by various packages affecting all the JVM languages; not typically used directly by application code. In this case, it was a dependency of kotlintest.extensions.

Using deprecated features is a time bomb in the code. Eventually, one of three things will happen:

The organization will continue to run the code as-is. They will not be able to update their Java environment to a supported level, and they will continue to run on back-versions of various dependencies, including Java itself. Some packages, Open Source projects, and COTS solutions will not run in the environment because of the obsolete dependency. They will continue to work around the problem with ever-increasing risk and cost.
Other teams will copy the pattern they see in this code base to save themselves time and to try and stay consistent with existing code in the shop. At some point, people will have to remediate this code, and they will have a big job on their hands, as it will have been replicated in many other places.
Developers will stop running the tests and, possibly, decide that automated testing just isn’t worth the trouble. Then there will be increased business risk across the board in the organization, as teams will not provide test coverage for applications anymore.

The team was adamant that they needed these unit tests. I suggested they find another way to mock out the dependencies, if they really felt it was necessary. Personally, I wouldn’t have checked this at the unit level because there’s no behavior to check; application behavior first becomes checkable at the integration level.

At the unit level all we can do is verify our code makes the correct call to the framework. That makes it an implementation-aware test, and therefore a bit fragile and maybe not so reliable as a safety net for refactoring; and it’s a problem that will make itself known immediately in an integration test.

But this kind of decision is a judgment call, and I respect other people’s judgment. It’s their world and they have to live in it their own way.

Why were they so worried about keeping these unit tests in place? I think it’s because they had exactly two “levels” of automated checks in their build – a unit level, with test cases of small scope and good isolation, and a full-blown end-to-end testing level with all external dependencies live, including Internet services they don’t control and local servers that have to be up and running.

They had nothing in between. Without those unit checks, any problem with the glue code would have resulted in some hard-to-track-down error “in the large.” I hope they will consider adding appropriate automated checks at intermediate levels of abstraction.

This is a small example of why we should remember the point of the test automation pyramid, even as we avoid using the model in a rigid or mindless way. Either extreme is risky.