Against TDD – NeoPragma LLC

Test-Driven Development (TDD) is a tool. To get value from a tool, it’s necessary to:

choose the right tool for the job; and
use the tool properly.

Circa 2019, there are numerous debates about whether TDD is useful. It seems to me many of the arguments against TDD boil down to one of the following:

trying to use TDD in situations where it is not the right tool for the job; or
using TDD improperly.

Choosing the wrong tool or using a tool improperly are certainly things that we fallible humans do from time to time. However, when people have experienced those things with respect to TDD, that isn’t what they say. Instead, they say TDD categorically does not help. It is inconceivable that they could have made a mistake. The tool they used must have been at fault. They are here to warn you against being harmed by that tool.

For more information on that line of reasoning, please see the articles, We Tried That And It Didn’t Work (me), and We Tried Baseball And It Didn’t Work (Ron Jeffries). In a nutshell, the logic goes like this: I once hit my thumb with a hammer; therefore, hammers don’t work.

You’re doing it wrong

No one likes to hear they’re doing it wrong, least of all when they’re doing it wrong.

What if you tried TDD and it didn’t “work” (whatever that means), but in fact the thing you tried wasn’t TDD at all? If you can set aside the certainty that you are incapable of making a mistake, just for the sake of discussion, then what would this sort of mistake look like?

A question on Quora in 2018 reads, “What do people not like about TDD (Test-Driven Development)?” Steven Grimm replied to the question with four bullet points. Here is the first:

If you’re practicing fine-grained TDD as it’s often taught, adding a new failing test case before you allow yourself to implement each incremental bit of program logic, you can easily end up with enormous suites of mostly redundant, relatively low-value tests that, unless you have been very careful, are tightly coupled with your initial implementation and will be a headache to maintain as the code evolves.

There’s a fair bit to unpack in all that, including:

an unvalidated assumption about how TDD is taught (“as it’s often taught”)
an unsupported assertion regarding the size of the resulting test suites (“enormous”)
an unsupported cause-and-effect statement connecting the use of TDD with poor test design (“mostly redundant”)
an unsupported assertion regarding the value of the unit test suite (“low-value tests”)
an incorrect assertion that the test-first approach couples the tests with the initial implementation (“are tightly coupled”)
an incorrect assertion regarding the general effect of TDD on code understandability and maintainability (“will be a headache to maintain”)

And all that is only the first of four bullet points of similar density. This isn’t to pick on him; the things he writes in that response are common objections to TDD that we hear and read everywhere. So, let’s examine each criticism to clarify whether and how it actually relates to TDD.

People who teach TDD include information about when it’s useful and how to use it effectively. It isn’t taught as a rote, mechanical repetition of the red-green-refactor cycle, as if developers were mindless automatons. That assertion is false. When people use TDD in that manner, they end up in whatever ditch they dig for themselves; but they would end up in that ditch regardless of what kind of shovel they used to dig it. TDD doesn’t “cause” good or bad design. Neither do any other tools or techniques. That’s on us.

When we use TDD properly, we end up with a unit test suite that covers as many meaningful examples of low-level functionality as we could think of during the development process. Will that be “enormous?” It’s hard to say, as the word “enormous” isn’t quantified. In my experience, working in languages like Java or C#, I often end up with five to ten times as much unit test code as production code; and that’s when I approach it in a minimalistic way.

Criticizing TDD on this basis is tantamount to saying it’s “wrong” or “bad” to cover the code properly, so that we don’t break production when we deploy. It strikes me as absurd. The size of the test suite is whatever it has to be to provide value. Sometimes that’s a lot, sometimes it’s a little. If you have only a few examples, it may look like less test code to maintain, but there are probably a lot of problems hiding in the void.

Test case design is a learned skill. The fact you’re more-or-less following the red-green-refactor cycle doesn’t automatically result in well-designed examples. No tool automatically does that for you. The problem is orthogonal to the choice of method or technique for software design. I suggest you think about the value of an example before you write it, whether you’re writing it before or after the code under test. Otherwise, you may end up with “low-value tests.”

Does TDD couple your test suite to an initial implementation of the production code, thus limiting your ability to evolve the design? I’ve heard this before, but I haven’t experienced it. When I ask people to show me how this happens in their work, they are usually willing to do so. What they show me is that they think TDD means writing the tests first only the first time you touch the code. After that, they change the production code first and struggle with broken examples. They don’t know how to use the hammer, so they hit their thumbs.

When you evolve your low-level design through TDD, you can’t break the tests by making a change to the code. By definition, you write (or change or delete) the test case(s) first, before you make the change(s) to the production code. It’s impossible to “lock in” your initial implementation, because you drive the evolution of the code by changing the examples before you change the production code. Otherwise, you’re not using TDD at all, and you can’t blame TDD for your results.

This is already getting a bit long, but Grimm’s reply to the Quora question covers a lot of common misconceptions that many people express about TDD. Let’s take a look at the second bullet point:

Related to the first point, applying TDD thoughtlessly can waste a lot of time. For most applications (with exceptions like spacecraft and SQLite) tests need to be evaluated on the basis of cost and benefit, and there will be some threshold beyond which the cost of writing a particular test exceeds the maximum possible value it adds. “Is writing a test for this worth it?” isn’t a question you can even ask in a strict TDD environment, though, so you end up paying the cost without even attempting to figure out the benefit.

Notice the false assertion: “Is writing a test for this worth it?” isn’t a question you can even ask in a strict TDD environment. That’s just flatly wrong. It’s a question we are expected to ask regardless of the development techniques we use. It’s a fundamental part of a developer’s job to ask that question. No one…no one…suggests that you write test cases (or any other code) mindlessly. Please don’t be misled by this sort of thing when you come across it!

The third and fourth bullet points in Grimm’s response contain the same content as the first two, but phrased differently. This sort of criticism of TDD is pretty common. Yes, of course we need automated test suites at multiple levels of abstraction. TDD doesn’t tell us we don’t need that. As far as I know, no one suggests that. None of this is really a criticism of TDD or any other tool or technique. The issues enumerated in the response are caused by careless development practices.

Ask me no questions and I’ll tell you no lies

On a more general level, I have to wonder about developers who don’t ask questions. Here we have an assertion that it isn’t permissible to ask questions about the value of a proposed unit test case. What else would such a developer not ask questions about? It seems to me we’re constantly asking questions about what we see in the code, and the test cases are part of the code.

Hey, should this catch block be empty? Isn’t that dangerous?
This routine doesn’t explicity close the stream. How does the application handle that? Do we have tests in place for it?
This code looks like it’s asking for a thread race condition. Are we seeing this in production?
Are you sure we should add another else block here, or should we re-think this logic? It’s starting to look like a candidate for the State pattern to me. What do you think, esteemed colleague?
There’s a unit test here that only asserts the constructor is capable of instantiating an object. What’s the value of that? Can we delete it? Do we have other examples that check behavior? Should we add some?
We have 87 test cases scattered everywhere to make sure the code handles Social Security Number (SSN) correctly. SSN is represented as a Java String. Why don’t we define a class for SSN to encapsulate this stuff? We could delete 73 of these test cases, and the application would behave more consistently, too.
Hey, this Java code is doing monetary calculations with floating-point types. Isn’t that risky?

If it were true that TDD prevents you from thinking about your work, then I would agree with Grimm’s criticisms. Fortunately, it isn’t true. What worries me is that a lot of developers who aren’t familiar with TDD might assume the criticisms are valid.

No magic bullet

I’ve mentioned that choosing the right tool for the job is important. TDD isn’t a fit for every situation. It’s useful when we’re writing code by hand. When we’re using a code generator or assembling components from a library or writing “glue” code to interact with a framework or customizing a third-party package, it’s often advisable to start test-driving (or testing) one logical level “up” from the unit level. Some examples:

Comprehensive application code generator such as Oracle Application Development Facility (ADF). ADF generates a fully-functional CRUD application based on an Oracle database schema. It allows for UI customization and for dropping POJOs (plain old Java objects) into the request/response cycle, to support business rules along with basic CRUD operations. The POJOs should be test-driven, but there’s no value in trying to test-drive the whole solution.
Third-party package such as Siebel or SAP. Many solutions are built by customizing and configuring a third-party package. Some products allow for a degree of unit testing for custom-written code, such as SAP’s ABAP unit testing framework. Others aren’t designed to enable unit testing, such as Siebel. Testing has to be done at a higher level of abstraction than the unit level.
“Glue” code for using frameworks. Many applications use one or more frameworks. JavaScript frameworks for front-end or single-page webapps are a common example as of 2019. There are also numerous webapp frameworks, IoC containers, and other libraries for Java, Python, Ruby, and other languages. Application code has to interact with these frameworks and libraries, but the rule of thumb is to separate the concerns of framework interaction and business logic. The bits of code that interact with frameworks need not be unit tested explicitly, as there are no “interesting” behaviors at that level of abstraction. We can test-drive the business logic components without the need to instantiate the framework, if we separate concerns.
“Heavyweight” legacy tooling may not lend itself to TDD. For example, the 1990s-era Java development tool, JDeveloper, generates chunks of source code from underlying XML definitions. It can take several minutes to run a local build. When developers want to use TDD to achieve very short feedback loops (in the range of seconds), a tool like this makes it frustrating to follow the TDD cycle. As a practical matter, it’s often better to skip the unit test level, even if doing so exposes us to some risk.

What about properties or accessors? Some languages, like C# and Ruby, can generate the code for properties or accessor methods for you. Some development tools, like IntelliJ IDEA and Eclipse, can generate accessor methods for Java. My rule of thumb is that when I use these features, I don’t test-drive the accessor methods. When I hand-code the accessor methods, then I do test-drive them. Many people say if the code is simple, there’s no need to test-drive it. I disagree. Any time we hand-code something, there’s a chance we’ll make a mistake. The simpler the code, the less attention we pay to it, as we’re thinking about the bigger picture. All the more reason to test-drive even the simplest methods and functions, if we’re hand-coding.

Swing that hammer!

There are two flavors of TDD. One is called “classic style” or “Detroit school” TDD. It’s very useful for building up algorithmic code, and for driving inside-out design. The other is called “mockist style” or “London school” TDD. It’s very useful for building up applications that comprise numerous components that interact through APIs, and for driving outside-in design. In practice, developers switch between these approaches as appropriate in the course of their work. Clearly, critics of TDD who believe it’s just a mechanical process lack this level of familiarity with the technique.

The type system of the language you’re using influences the number of unit tests you need to provide meaningful coverage. A dynamically-typed language, like Ruby or JavaScript, or a legacy language that doesn’t have a robust type system, like COBOL, generally calls for more test cases than other languages because you can’t count on the data types being what they are supposed to be at run time. A language that has static typing, like Python, C#, or Java, doesn’t require as many unit test cases…if you actually use the type system. Remember the Social Security Number question above. A strongly-typed language, like Haskell, generally calls for still fewer unit test cases, as the type definitions take care of more potential run time issues than other languages.

So, there are situations when we want to use TDD and situations when we don’t. There’s context around which flavor(s) of TDD are appropriate. There’s context around what has to be checked and what is handled automatically by the programming languages or the runtime environment. TDD isn’t a mindless, repetitive, mechanical process. It’s a development skill like any other, requiring knowledge, practice, and experience to do well.

Show me a study!

A second line of argument against TDD comes down to personal preference disguised as high-level academic or intellectual criticism. It often includes ad hominem attacks on people whose names are associated with TDD, blanket condemnation of those who use the technique as a “cult” or “religion,” and demands for scientific “proof” that TDD “works.”

There are a couple of problems with this.

First, I must wonder whether anyone depends on academic studies when they choose software development techniques. I have not yet met a software developer who refused to use a given technique until he/she felt it had been adequately “proven” through academic research. Most of the developers I have met use the techniques they were taught when they first learned about programming. They simply accepted the advice of their instructors or colleagues. The next time someone asks you for a “study” that “proves” TDD works, ask them to show you the study that convinced them to write code in the way they currently do. You will be rewarded with blesséd silence.

Second, the quality of academic research in the field of application software development is open to question. Read more about this issue in the article, All Evidence is Anecdotal (me). You can find studies about TDD and other good practices for software development. Nearly all of them fall into one of three categories:

The researchers did not understand what they were observing, and reached conclusions inconsistent with the results demonstrated in industry practice.
The experiment was set up poorly, and did not yield meaningful observations (but the authors drew conclusions anyway).
The number of observations was insufficient to draw any conclusions (but the authors drew conclusions anyway).

While the quality of research in many fields is high enough to let us depend on studies for information, this is not the case in the field of application software development. If you are in favor of TDD, you can find studies to support your view; if you are opposed to TDD, you can find studies to support your view.

When people demand studies to “prove” a software development technique “works, it’s a sure bet they didn’t wait for a study before they started writing code in whatever way they currently write it. If you showed them 100 studies, they would reject each of them for one reason or another. They don’t really want to see a study. They are saying: “Go away and leave me alone. I don’t want to learn anything.”

Legitimate concerns about TDD

Another response to the same Quora question comes from Jeff Langr, a technical consultant who has been practicing TDD for quite a long time and has helped many developers learn it. He lists several legitimate concerns about TDD. I’ll snip out and paraphrase key points from his answer:

seems slower at the outset
represents a different way of thinking
hard to break old habits
doesn’t yield expected results, usually due to learning-curve issues for new practitioners
impatience to go ahead a code up an “obvious” solution

I would like to suggest that these are not legitimate concerns about TDD as such. They are legitimate concerns about starting to use TDD for people who are already experienced using other development methods.

Personally, I see the adoption of TDD by existing practitioners as a correction, not as the introduction of something new and strange. They should have been teaching this in schools all along.

I spent the first 24 years of my career writing code first, and hardly ever writing executable test cases (although there were some situations when I did so). From the moment I was introduced to TDD in 2002, I recognized its value. I used to tell people I had wasted the first 24 years of my career, and that if this sort of development practice did not catch on in the industry, I would find something else to do for a living. That’s the impact it had on me. Your mileage may vary, of course.

There’s an old saying that unlearning is harder than learning. That’s why people have difficulty adopting TDD. The more experienced they are, the more they have to unlearn. In fact, there’s nothing difficult about TDD. There are more-effective and less-effective ways to do it, but it isn’t “hard.” The hard part is unlearning what you already know.

Production focus, not delivery focus

There’s a long history in our line of work of people rushing through development to try and push code to production as fast as possible. As a result, there’s a long history in our line of work of production issues and unhappy customers.

Back in the Olden Days, when I worked on mainframe applications in the 1970s and 1980s, it was normal for the same people who built an application to support it in production. That changed at some point in the late 1980s or early 1990s. One team would build the application and “deliver” it. A different team dealt with all the production issues and unhappy customers.

In large organizations, there was often little or no communication between the two groups, and little or no opportunity for double-loop learning about how to design robust solutions that wouldn’t cause problems in production, or that would at least provide some help for support personnel to resolve issues. Development teams “delivered” the same flavor of crap to production with every release.

Today, there’s a strong focus on “delivery.” The Agile movement has emphasized cross-functional development teams that include skills like programming, analysis, testing, and design. The DevOps movement has emphasized automating the delivery pipeline. The testing community has emphasized automated functional checks and exploratory testing practices.

All this has been focused on delivery: How fast and how smoothly can a development team “deliver” the unwanted baby to the steps of the orphanage and walk away, leaving someone else to care for the child?

But there’s a problem with this focus. Year by year, the demands on companies to interact with their markets dynamically have been increasing. The technologies to support the flexible and rapid release of functionality to the market have evolved. We’re in the age of microservices now. Applications comprise many small components that interact through APIs. The runtime environment is often some flavor of “elastic cloud” infrastructure that adds and removes resources automatically based on demand. People use immutable server and phoenix server strategies to minimize configuration drift and make it a little harder to hack into systems.

Given that reality, it isn’t possible to test everything that could possibly go wrong prior to release. Our systems live in a world of gray failures and unpredictable shifts in workloads, as well as active hacking. We have to monitor production in real time, test in production, and immediately respond to emergent issues that affect customers. We don’t have time to open a support ticket and wait for it to flow through some sort of incident management process, or be prioritized on a “backlog” by a “product owner.”

We no longer have the luxury of time to deliver the unwanted baby to the orphanage and walk away. We have to get back to the model in which the same people who build the solution also support it.

The first question to ask a team nowadays is: “Can you release to production safely from whatever is in your mainline at any time, without fear and without a lot of additional preparation?”

That doesn’t mean we can forget about delivery. It shifts our focus from delivery to production. We want to use any and all techniques and tools that enable us to maintain the integrity of the production environment and deal with any emergent issues quickly and effectively. We have to consider new “-ilities” in our solution designs, such as resiliency and observability.

There are many practices and tools we have to learn, including those of us who are already experienced in the field. Like it or not, TDD is one of those practices. It helps us introduce changes to production without breaking things.

What about personal preference?

Based on my description of TDD, you might think I’m suggesting it’s mandatory. Actually, I consider people’s happiness and personal satisfaction to be more important. We don’t live to work; we work to live. And some people just prefer not to write code in a test-first way.

I know everyone will not “like” TDD, and they don’t have to give me any reasons or explanations for that. They are choosing to do their work less effectively than they could, but that’s not my call. It’s theirs. It’s yours. I only ask that you be aware of what you’re trading away, and not close your mind to it because it doesn’t feel good to say you’re trading anything away.

Notwithstanding the comments about microservices and cloud environments, it’s still true that many organizations aren’t operating that way. Traditional methods might be just fine in many cases. Those are good places for opponents of TDD to work.

You might think, based on my obvious support for TDD as a good practice, that I would expect anyone who learns it properly will react the same way as I did, years ago: [slaps forehead] “Why haven’t I been doing this all along?”

But I’ve seen a counterexample. One counterexample is sufficient to disprove a generality.

I was working with a large client several years ago, and one of their lead developers was strongly opposed to TDD. Neither I nor any of the other technical coaches could convince him to work that way. I paired with him on occasion, and saw that the quality of his work was very high. He tended to write skeleton classes and then fill them in little by little until all the functionality was in place. That’s a time-honored approach. But he didn’t test his code, except in a manual and ad hoc way. The initial implementation was sound, and that’s what he cared about. I would say one reason he could afford to stop caring at that point was that the development teams at that company were not directly responsible for production.

One day we were asking for volunteers from client staff to run a code dojo. He volunteered to run a dojo about TDD. With no help from any of the technical coaches, he prepared the session and facilitated it like an expert. He taught 24 of his colleagues how to do TDD properly. It turned out that he understood TDD on a deep level. He was able to overcome people’s objections, explain the value proposition, and put realistic context around the design and creation of test cases. He was as skilled at TDD as any of us coaches, and as skilled a teacher, mentor, and coach.

He simply preferred not to do his own work that way.

And that’s okay. I’m not preaching at you. I’m just trying to clarify some of the common misconceptions about TDD that you’re likely to find online and hear from colleagues who don’t have a good grasp of the topic. Make your own professional choices. Just make them based on knowledge and not on false assertions.