Posted on 11 Comments

TDD and software archaeology

Can you tell by looking at the unit test cases whether the code’s author wrote the tests first? During a TDD class I taught last week, one of the participants suggested that you can’t tell. Completed code and unit tests would look the same regardless of which had been written first. At the time I didn’t give the comment much thought.

I returned to the team I was working with at another client on Thursday, the last day of their iteration cycle. I picked up a small story so I could contribute something before the end of the day. It was a bug fix for JavaScript input validation logic in a webapp. Like many applications, this one uses date ranges to activate and deactivate certain data values that drive functionality. In this case, the input validation function neglected to ensure the expiration date was later than the effective date.

My first stop was the Jasmine unit test cases. Sure enough, there was no test case expecting the expiration date to be later than the effective date. I added three cases — effective date earlier than expiration date, effective date later than expiration date, and both dates the same. It was straightforward to change the date validation function to make the tests pass. Fortunately the code is very clean, so there was no induced work due to design debt.

The student’s observation may be correct in some cases, but I think in general we can tell whether the tests were written before the code by looking at the tests, the code, or both. In this case, the Jasmine tests told me the code had been written first. I didn’t have to look at the production code to know that. The tell-tales were:

  1. There are no happy path cases. If the code had been driven by the tests — that is, if the test cases had described the functionality desired in the code, and then the code had been written to make that happen — then all the functionality would have been described in test cases, and not only the exceptions. Code driven by this set of test cases would never submit the input form, because there is no test case that drives that behavior. Based strictly on the test cases, the production code should do nothing at all except report invalid input values. Yet, this is not true — in production, the input form is submitted and the transaction is processed. Someone wrote the code to do that without reference to any test cases.
  2. In every single test case, the message Jasmine displays when the case fails reads, “it returns false.” Had the code been driven by the test cases, this would mean the business purpose of each and every JavaScript function is to “return true.” Clearly, this is in no sense a “business purpose.” It’s hard to imagine a business stakeholder demanding that the software should “return true.” The software archaeologist in me suspects the author of the tests read through the JavaScript functions that already existed and wrote the test cases in terms of their return values. The test cases describe implementation rather than behavior. When code is driven by tests, the tests describe behavior.
  3. The nature of the defect suggests the original author wrote the code first and the tests second. With his/her focus on the individual input fields, it would be easy for the author to forget to write validation logic that involved multiple input values. By describing behavior through test cases first, it would be hard to forget such a basic date range validation.

In this case, it was easy to see that the code had been written first by looking just at the unit tests. Writing the code first can also have a visible effect on the production code. Pairing on the same client’s back-end Java code recently, a team mate and I spent two days on a modification that should have taken no more than two hours. The code we had to work with had the following characteristics:

  1. Class with multiple responsibilities
  2. Long method
  3. Method with multiple functions
  4. Method operating at multiple levels of abstraction
  5. Reliance on side effects
  6. Tight coupling
  7. Embedded calls to static methods
  8. Deeply nested if/else structure with redundant tests of the same elements
  9. Hard-coded instantiation of collaborators
  10. Inconsistent naming conventions
  11. Duplicate and nearly duplicate code
  12. Stale comments
  13. Poor use of language’s type system — everything is a string

It’s unlikely the test cases were written before the production code because it would be very hard to come up with a concise test case whose simplest possible implementation is a massive hairball. Test-driving tends to result in a larger number small methods that perform just one function each, rather than a smaller number of large methods that perform multiple functions. You can often tell by looking at the code whether the tests were written before or after the production code.

11 thoughts on “TDD and software archaeology

  1. You forgot the most obvious case: the unit tests do not exist. 🙁

  2. If I had test-driven the blog post, I would not have neglected that obvious case.

  3. Hi, Dave,

    Thanks for another great post.

    I feel you’re comparing apple with oranges.

    In my experience, code written using TDD usually ends up with long methods, complex classes, deeply nested conditionals and all the evils you found in that code. One of the many problems with TDD is that you write a test then the code then another test on the same piece of code and then you have to change that smell piece of code again to pass the new test, automatically coupling the functionality of both iterations. When writing the code first, this rarely happens, as without the tests to blind us to the structure, we immediately see where decoupling should be employed.

    1. I can’t dispute someone else’s experience, so I must accept what you say as given. I will say that I’ve never seen such an outcome…but I haven’t seen everything. Your comment strikes me as especially odd coming from an ObjectMentor email account. In fact, it makes me suspect you might be yanking my chain. 😉

      Although I don’t want to leap to conclusions without having seen the situation first hand, I can’t help noticing your description of the TDD cycle seems to omit something. Coincidentally, the missing something is the very bit that helps us avoid ending up with poorly factored code. Funny how that happens, eh?

      It’s almost as if one falling domino pushes the next one, in a cause-and-effect way, until no domino remains standing. Then, when Mom yells at you for making a mess of the playroom, you can say “It wasn’t me! It was TDD’s fault!”

      While I haven’t seen the outcome you describe as a direct result of TDD, I have certainly met and worked with many programmers who think TDD is a bad idea, or who believe they’re practicing TDD while omitting the same small-but-useful bit of the TDD cycle. I suspect I would find your comment more compelling had I ever seen one of them deliver clean code.

      Ever.

      Even once.

      But I haven’t seen everything.

    2. Shmoo – wow we must live in radically different universes (mine has the dark matter). TDD is Red – Green – Refactor. Without the Refactor part which should always lead to simplifications its not TDD.

      Funny thing as I read your comments my experience is the inverse: Test Driven code is usually simpler with the fewer if statements. One reason because you had to write a test case to justify the if statement. being the lazy person I am I really don’t like writing extra test cases so I don’t write if statements.

      Clearly you’ve seen different code than I have.

      Cheers
      Mark

      1. Hi Mark,

        I can relate to the bit about being lazy. One thing I appreciate about TDD is that it helps me avoid creating what Alan Shalloway calls “induced work” – extra work that isn’t really necessary – for myself and for others. Sadly, most programmers don’t seem to care very much how hard it is for others to deal with the code they leave behind, or even if they create extra effort for themselves the next time they have to modify their own code. I’m far too lazy for that.

  4. There could be another way to tell. Have you seen Keith Braithwaite’s measure?

    1. Haven’t seen Keith’s measure. Can you point me to it, please?

  5. Hi Schmoo–

    Good to hear from you again.

    “In my experience, code written using TDD usually ends up with long methods, complex classes, deeply nested conditionals and all the evils you found in that code.”

    This is a sad statement about coders in general. In my experience, most code looks that way, including most–not all–of the shops I visit where they claim to do TDD. I think you rightly point out a serious problem.

    Where have I seen well-structured code? Our team’s Smalltalk code (work I did in the mid-90s, i.e. pre-TDD) was good, and I saw a few other Smalltalk efforts that were also impressive. Of course Kent Beck was one of those leading the charge about tiny, well-named methods and such. But I also ventured into another team’s Smalltalk code and it was abysmal, an affront to the spirit of Smalltalk.

    (It’s interesting to me that many who’ve written often on design are often the ones who also promote TDD–Kent, Uncle Bob, Larman, me. Certainly there are others on the opposite side of the fence–Cope, for example. I also remember that the But I think Bob and Larman are good examples of folks who started elsewhere and underwent a conversion as they practiced TDD more.)

    I’ve probably been in a couple hundred sizable source bases since then (~60 different customers or employers in that time), a mix of some TDD and some not. I’d say 95% of the codebases were typical slop, painful to work in (and the devs complained daily). The rest–maybe 10–were very good and easy to work with. They were all TDD.

    I’ve not seen the problem you describe regarding how TDD creates too tight of a binding, or maybe I’m not understanding it well. Can you provide an example? Certainly, the iterative approach of TDD generates continual small amounts of rework, but I favor that tradeoff–usually a much more knowable quantity–than the unknown defects and amount of tail-end corrective work that’s usually needed.

    I might go as far as to say what really matters is what the team agrees on in terms of what they want to allow in their system. But I don’t think it’s coincidence that the few teams I’ve encountered that’ve managed to have kept the crap out of their system are TDD teams.

    I’ve done enough coding without TDD, about 17 years prior to learning about it. Since then–13 years ago–I’ve coded on small systems a few times without TDD, thinking the quality design smarts I’ve picked up over the years would keep me out of trouble. They didn’t–once the systems I was working on grew to even a few hundred lines, it was obvious that the quality was degrading, and it quickly became hard and slow to do much about it, and our natural response was to shrug our shoulders.

    TDD won’t magically transform a team of mostly “design deviants” into crack coders without some education or outside help. But I’ve seen it help a sharp team keep better reins on the code they do produce, particularly as the size of the systems increase.

    Regards,
    Jeff

    1. Interesting. I like it because it shows how we can infer from metrics how the code was built. This makes it possible for people who lack the background to recognize the tell-tales on sight to identify sections of a code base that might benefit from analysis and remediation.

Comments are closed.