Notation and Thought in Unit Testing

The paper “Notation as a Tool of Thought” by Kenneth E. Iverson of the IBM Thomas J. Watson Research Center was published in the Communications of the ACM, Volume 23, Number 8, 1980-08. It’s available online at https://www.jsoftware.com/papers/tot.htm. Thanks to John Arundel (https://twitter.com/bitfield) who called attention to the paper in a tweet.

The paper deals with the way an appropriate notation influences people’s thinking in a given domain. Iverson was interested in the domain of mathematical computing, and he uses the APL language to illustrate his ideas. APL is an array-processing language Iverson developed in the 1960s. It has a following even today. You can read more about it on the APL Wiki at https://www.aplwiki.com/. You can try it out online here: https://tryapl.org/,

In this piece, I’m going to stretch the point a bit and apply some of the principles of notation Iverson presents in his article to a different domain: unit testing of application software.

Principles of a “good” notation

This is far from the domain of mathematics, but I found the general principles of useful notation to be applicable. Iverson lists these principles:

Ease of expressing constructs arising in problems.
Suggestivity.
Ability to subordinate detail.
Economy.
Amenability to formal proofs.

I will take some liberties to interpret these principles in a way that applies to the domain of unit testing application software.

For a software engineer writing unit test cases, ease of expressing constructs arising in problems refers to the ease with which she can write a unit test case. Does the notation make it cumbersome or tedious to write test cases, or does it naturally lend itself to the task?

In this context, suggestivity means the software engineer can intuit how to express additional test cases by reading existing ones.

Ability to subordinate detail refers to the notation’s utility for focusing in on the purpose of the test case without including too much “setup” or “background noise.”

Economy refers to how concise the notation is; not necessarily in the sense of being painfully short or abbreviated, but in the sense that the notation doesn’t require much supporting boilerplate code above and beyond the expression of the test example.

Those points are fairly easy to transfer to the domain of unit testing. The final one, amenability to formal proofs, may be less obvious in this domain.

The notion of formal proofs, in the mathematical sense, does not really apply to unit tests except perhaps indirectly. But there are some aspects of testing and test-driven development that benefit from some degree of formal analysis. I can think of these, at least:

Heuristics to select unit test cases that should be executed following a modification to the code under test; a common feature of continuous testing tools such as Infinitest for Java or autotest and similar gems for Ruby.
Development of property definitions to support Property Based Testing (PBT) such as fscheck for .NET or Hypothesis for Python.
Ease of implementation of mutation testing tools to operate against the test suite, such as Pit for Java or Stryker for JavaScript, TypeScript, C#, and Scala.

Those activities are somewhat “proof-ish,” even if they don’t rise to the level of formal mathematical proofs. A test notation that facilitates them will be more useful than a notation that hinders them.

What’s the problem?

Why bother considering the nuances of test case notation at all, other than as an academic exercise for our free time?

People often think about programming languages as supporting a paradigm or model or style of programming. One common taxonomy of programming paradigms is:

Imperative – Procedural (e.g. C, Cobol, PL/I)
Imperative – Object-Oriented (e.g. Smalltalk, Java, C#)
Imperative – Functional (e.g. Haskell, Clojure, F#)
Declarative (e.g. SQL, Chef, IBM Job Control Language)

Some may quibble about those categories, but the basic idea is that there are different models or approaches to designing software, and languages that support those models or approaches. A software engineer accustomed to thinking in terms of a certain paradigm will find test notation consistent with that paradigm to be easier to use and more useful than a notation based on a different paradigm.

I have seen people struggle to use unit testing tools that were originally based on the mindset of a different programming paradigm than the application at hand.

For example, some of the first property-based testing (PBT) tools to become available for the Java language were ports of QuickCheck from Haskell. Several are available. They tend to be very challenging for Java programmers to learn and use. I think the main reason is that Java is an object-oriented language while Haskell is a functional language.

Carrying QuickCheck over from one paradigm to another is difficult. In contrast, jqwik is a PBT tool built from scratch and with an object-oriented mindset. It is far easier for Java programmers to pick up.

Another example is IBM’s zUnit. zUnit is an implementation of the popular xUnit architecture, which had its origin in Kent Beck’s SUnit tool for Smalltalk. zUnit is meant to support procedural languages such as Cobol and PL/I that run on IBM mainframe systems.

The xUnit architecture is based on object-oriented thinking to support an object-oriented language. Ports of the tool for other object-oriented languages are extremely popular – JUnit (Java), NUnit (.NET), MUnit (Cold Fusion), PyUnit (Python), etc. In this case, porting an object-oriented tool to the procedural world is the paradigm-crossing problem.

Mainframe programmers have had difficulty wrapping their heads around zUnit, as it is fundamentally based on an object-oriented model. It is a faithful implementation of the architecture, but fails on all five of Iverson’s principles of “good” notation, in the context of unit testing procedural code.

So, the challenge is to provide the benefits of example-based and property-based unit testing, and possibly other useful practices such as mutation testing, in a way that is intuitive and natural for each programming paradigm. Trying to force-fit a testing tool from one paradigm to another – functional to object-oriented, or object-oriented to procedural – has not served developers well.

Issues with available Cobol unit testing tools

Apart from the paradigm mis-match, zUnit has a couple of other issues with respect to unit testing. First, the scope of executable test cases zUnit generates is far larger than that of a “true” or “pure” unit test case. It runs an entire executable, and provides input and output files. Unit tests are very small and highly targeted to a small subset of a program’s logic. Generally, they conform to Michael Feathers’ “rules” for unit tests, documented here: https://www.artima.com/weblogs/viewpost.jsp?thread=126923.

Second, zUnit expects the code under test to exist already. It generates test cases based on the existing production code. That’s more-or-less okay for a test-after approach, but doesn’t support a test-first approach. In my view a useful unit testing tool facilitates either test-after or test-first, according to the developer’s preference and/or the specifics of a given situation.

If you have sufficient time and patience, you can try to hand-craft zUnit test cases before writing the corresponding production code. All I can say is you won’t enjoy it.

I don’t mean to pick on IBM. Other commercial products that support “unit” testing for mainframe Cobol have exactly the same issues – object-oriented approach force-fit to work with a procedural language; the assumption that people always take a test-after approach; and “unit” tests of significantly greater scope than “true” unit tests.

Using such tools, it is not possible for a Cobol developer to enjoy the same smooth work flow as developers working in other languages and on other platforms. Often, the end result is that the developers simply stop using the tool, and forego unit testing altogether.

Cobol Check and Iverson’s principles of “good” notation

What we’re after is a unit testing tool that supports fine-grained testing of individual Cobol paragraphs – the conceptual equivalent of a single method in Java or a single function in F# – with no need for external resources such as files, that supports either test-after or test-first development, that “feels” and operates in a procedural style rather than an object-oriented style, and that presents the Cobol developer with an intuitive syntax that meshes naturally with Cobol syntax. It would also be convenient if developers could work independently, without a live connection to a mainframe system, until they are ready to upload their code for integration testing and functional testing in a real environment.

The open-source project cobol-check, which is a follow-on from the proof-of-concept project cobol-unit-test, is an attempt to provide Cobol developers with such a tool. Cobol-check was initiated in December, 2020. As of this writing, it is at a very early stage of development. We are applying lessons learned with the proof of concept since 2014 to build a better unit testing tool for Cobol.

Ease of expressing constructs

The special syntax or APIs provided by unit testing tools can be considered as domain specific languages (DSLs) for the domain of unit testing. Does cobol-check offer a DSL that meets Iverson’s principles of “good” notation, given the use case of a mainframe-oriented Cobol developer? Let’s see.

Here’s a sample of a couple of test cases for a hypothetical “Hello, World!” program.

           TESTSUITE
           "Greeting includes the user name when it is provided"

           TESTCASE "When message type is greeting it returns Hello, James!"
           SET MESSAGE-IS-GREETING TO TRUE
           MOVE "James" TO WS-FRIEND
           PERFORM 2000-SPEAK
           EXPECT WS-GREETING TO BE "Hello, James!"

      * You can include Cobol-like comment lines, if you wish
           TESTCASE "When message type is farewell it returns Goodbye, James!"
           SET MESSAGE-IS-FAREWELL TO TRUE
           MOVE "James" TO WS-FRIEND
           PERFORM 2000-SPEAK
           EXPECT WS-FAREWELL TO BE "Goodbye, James!"

How does this syntax support the idea of ease of expressing constructs arising in problems? “Problems” in this domain consist of fine-grained, executable examples that exercise small sections of a Cobol program.

Given the typical arrange-act-assert pattern for executable examples, the first test case uses standard Cobol statements to arrange the test:

           SET MESSAGE-IS-GREETING TO TRUE
           MOVE "James" TO WS-FRIEND

You might surmise from this snippet that the production code normally gets the value of WS-FRIEND from an I/O operation, such as an ACCEPT or a file READ. Note that cobol-check doesn’t have a dependency on any external resources such as the console or files.

Similarly, the act step is nothing more than a standard Cobol statement:

           PERFORM 2000-SPEAK

In the assert step, we finally see DSL-specific keywords: EXPECT and TO BE. The assertion is straightforward and easy to understand, even without reading any documentation or working through a tutorial.

           EXPECT WS-GREETING TO BE "Hello, James!"

The idea is to enable the developer to think about the problem at hand rather than using precious brain cells trying to wrestle with a test DSL that doesn’t fit the familiar paradigm or that requires special effort to set up test cases, including initializing external resources such as files.

I would say this DSL satisfies Iverson’s first principle of “good” notation, in context.

Suggestivity

We’ve adapted the definition of suggestivity to the unit testing domain by understanding it to mean that developers can intuit how to write additional test cases by reading existing ones.

Experiences in using the proof of concept project, cobol-unit-test, in code dojos with Cobol programmers at client companies, we have found they related to the DSL immediately. They did not find it in any way confusing, and were able to guess pretty accurately how to write more unit test cases based on a sample case provided to “seed” the coding exercise.

Based on that, I would say this DSL satisfies Iverson’s second principle of “good” notation.

Ability to subordinate detail

We adapt Iverson’s definition of this principle to refer to the understandability of the DSL by Cobol developers – people already familiar with Cobol syntax and procedural programming.

Cobol-check generates standard Cobol code that corresponds to the test cases written in its own DSL. That code is pretty verbose and would be difficult to live with if we had to maintain it by hand, or if we wrote something equivalent by hand.

Thus, the notation as such improves the readability and hence the maintainability of test suites. Boilerplate code necessary to make the test cases run is not visible to the developer. It is generated by the tool based on the expression of examples written in the DSL.

I would say this DSL satisfies Iverson’s third principle of “good” notation.

Economy

One language I enjoy working with is Kotlin. I normally use the kotest unit testing library alongside Kotlin. The library supports a number of very useful features, but in one sense the developers may have gotten a bit carried away. Kotest supports ten different styles of test layouts. All of them are in the nature of syntactic sugar; under the covers, there is only one test engine.

For this reason, I think the kotest DSL doesn’t meet Iverson’s principle of economy. We like having options, but ten different formats for unit test cases seems too many, in my opinion. A developer who normally uses one format might not immediately relate to test cases that were written in a different format.

What we’re aiming for is concise notation that doesn’t fill the page with unnecessary text while also not going beyond the point of diminishing returns with respect to abbreviations and special symbols. Readable, but not verbose.

I think cobol-check achieves this goal. You can probably visualize a very wide range of different unit test scenarios that could be checked using the same pattern of code as shown in the sample above. The majority of test cases would be in the neighborhood of 3 to 6 lines long. Most of the statements are plain Cobol statements, and the DSL statements are generally self-explanatory.

Amenability to formal proofs

As a language for expressing mathematical problems, formal proofs are highly relevant to APL. We adapt the definition of this principle to the domain of unit testing by suggesting it refers to the ease which which a tool might be written that can analyze cobol-check test cases for some practical purpose.

We suggest three reasons to do this are (a) to support test case selection for a continuous testing tool, (b) to generate default properties for a property-based testing tool, and (c) to analyze the code under test for a mutation testing tool.

None of these things has been done with cobol-check to date, although all three are future possibilities. Nevertheless, we can examine the DSL notation and reason about how difficult it might be to write such tools.

It is easy to identify the programs under test and the specific paragraphs within each program that are relevant to each test case. That should facilitate the development of continuous testing tools and property based testing support. By analyzing the test case setups and expectations, a mutation testing tool could determine the modifications that should be made to the code under test to provide meaningful mutation testing.

My conclusion is the DSL for cobol-check satisfies all of Iverson’s principles of “good” notation, when considered in the context in which the tool is meant to function.