Microtests and Unit Tests for z/OS Applictions

The idea of “unit testing” is pretty well-known in programming circles. Everyone has some concept of what it means and most software developers practice some form of unit testing.

Yet, there is disagreement about unit testing. If you’re trying to get a handle on this topic for purposes of supporting existing applications in a z/OS environment, you may find a lot of contradictory information online. Opinions are often presented as facts, and are defended strongly. I’d like to try to tease some of that apart so you can make practical sense of it.

Disagreements or misunderstandings seem to fall into a handful of categories:

What is the scope of a unit test?
Is there any value (categorically) in unit testing in the first place?
Assuming “yes” to the first question, in what situations do unit tests add value?
Are unit tests really “tests” at all?
Are unit testing and test-driven development the same thing?
Who is responsible for writing and maintaining unit tests?
What’s the value of “automated testing” vs. “manual testing?”
Frequency, cost, and risk of change
How does the type system of a programming language affect the necessity or value of unit tests?
When should we use example-based tests vs. property-based tests?
What are some practical guidelines for the design of unit tests?
How does all this apply to “legacy” code running on z/OS?

What is the scope of a unit test?

The first area of confusion has to do with the definition of unit. It’s fairly intuitive to say a “unit test” is a test of a “unit of code.” But what, exactly, is a “unit of code?”

People have been testing units of code for decades. The scope of these tests has depended on a number of factors over the years, including the characteristics of the programming language in use, how feasible it has been to isolate a chunk of code for purposes of testing, and (in the long-ago) the cost of running programs, which used to be higher than the cost of programmers.

As a result of these variations, the term “unit” has no standard definition. One person might consider a “unit” to constitute a much larger chunk of code than the next person, and neither of them is wrong.

As of today, proponents of unit testing tend to consider a “unit” to be a very small chunk of code. Writing in 2005, Michael Feathers proposed some characteristics of “true” or “pure” unit test cases. Later, Michael “GeePaw” Hill coined the term microtest in an effort to distinguish the modern concept of a “unit” from the conventional, looser definition, in part because it’s difficult to practice test-driven development using test cases of large scope, but also because in general smaller test cases provide more-useful results than larger ones.

Here are some online references on this topic:

How does this apply in the z/OS environment? We already use the term “unit” for a couple of well-defined things: compilation unit and run unit. Unit testing involves exercising some “chunk” of code in isolation from other parts of the application and from external dependencies such as datasets and databases. It’s a slightly different flavor of the word “unit.”

Because of the nature of mainframe technologies and traditional languages, the smallest chunk of code that has been practical to execute in isolation is a whole executable or load module. Tooling such as JUnit for Java or Rspec for Ruby has not been available for unit testing code written in Cobol or PL/I. As a result, most mainframers’ concept of “unit” does not get down to the fine-grained level contemporary practitioners usually mean when they say “unit test.”

Is there any value (categorically) in unit testing in the first place?

Some software developers swear by unit testing while others insist the practice offers no value at all. All of them are experienced and knowledgeable professionals, so how can they have such vastly different perspectives on unit testing?

Part of the confusion stems from the loose and inconsistent definition of a “unit” of software, and part stems from differences in approach; those who consider fine-grained testing of software-in-progress to be important will tend to support unit testing, while those who prefer to test their code “in the large” to obtain “meaningful” test results will tend to discount the value of unit testing.

So, this is another topic in which nobody is “wrong.” That statement in itself is probably unhelpful for you. Is there value in unit testing in your situation – supporting existing applications written in traditional mainframe languages like Cobol, PL/I, and Assember?

There are a few additional factors to consider to help you make that judgment. In the meantime, here are some references from people who consider unit testing to be waste:

I would like to offer an opinion at this point: Reading the material “against” unit testing, it seems to me the issues people tend to cite boil down to inappropriate test case design (cases are too aware of internal implementation details rather than focusing on observable behaviors of the code), tight coupling of logic in the application under test (leading to too much dependency on stubs and mocks), and/or inappropriate scope for the unit tests (almost always too large).

In a sense, then, people are creating their own problems. Rather than asking how they might improve their unit testing, they dismiss the practice categorically and advise the entire world to ignore it. This reaction is unhelpful, in my opinion. A more practical approach (in my view) is to understand that unit testing may or may not be valuable in your context. I advise you to assess your needs with your own brain rather than accepting anyone else’s conclusions (including mine).

People also have negative experiences when they attempt to write unit test cases for software that doesn’t lend itself to testing at that level of abstraction, such as COTS packages (e.g. Siebel or Avaya) or applications generated by an automated tool (e.g. Oracle ADF). Based on these experiences, some people declare that unit testing is categorically useless. But those experiences are not the only possible experiences.

It seems to me that unit testing, like many other aspects of our work, may have value depending on context, and it is not helpful to make blanket declarations about it. Still, I would like you to read and understand the arguments against unit testing. Their arguments may resonate with you.

On the other hand, the arguments against unit testing may (unintentionally) provide guidance in how to get value from the practice. For instance, a unit testing practitioner who finds it necessary to write a lot of mocks or stubs will react not by writing a book against unit testing, but by refactoring the code under test to make it more amenable to fine-grained testing.

In what situations does unit testing add value?

Different people have different opinions and experiences about this. Personally, I have found value in unit testing when I’m writing or modifying an application that is “hand-written” as opposed to generated by a code generator. So, when working on a webapp that is based on a framework (like JSF for Java, Django for Python, or Rails for Ruby), I find value in unit testing the components of the solution that are hand-written. I find little value in attempting to unit test the parts of the solution that are provided by a framework.

Apart from that, I have experienced mixed results when combining unit testing and/or test-driven development with other practices that were deemed “good” on their own merits, but that conflicted with each other. In a blog post in 2012, I describe one such project in a tongue-in-cheek way: A Recipe for Software Development.

In that project, we attempted to apply three different good practices: Extreme Programming, UX-Centric Design, and a code generator. Whether due to the nature of these practices or the way we used them, any two of the three were in conflict with each other in one way or another. In any case, the experience suggests unit testing, like any other practice, should be considered in context and not in a dogmatic way (either for or against).

In general, the rule of thumb I tend to follow is that unit testing is useful when writing code by hand, and possibly less useful when producing a solution through a code generator or framework.

When supporting existing applications in a traditional z/OS environment, nearly all your code is hand-written. Irrespective of the availability of suitable tooling for unit tests, your situation is that you are supporting hand-written code. Something to consider.

Are unit tests really “tests” at all?

The word test is overloaded in our line of work, just as the word unit is overloaded. Any time we exercise a chunk of code to learn something about how it works, we say we are “testing” the code. But there is a difference between testing software to learn how it might behave under some set of conditions and testing it to verify that it behaves in a certain predefined way.

To distinguish these ideas, software testing specialists often use the term checking to refer to activities intended to verify predefined behavior, and the term testing to refer to activities intended to expose new information about how a system may behave under various operating conditions.

Here are some references on this topic:

The industry as a whole has not adopted the term checking. People talk about unit testing and test automation and test-driven development, when these practices are really checking that the software produces the expected result when executed under controlled conditions.

The reason I mention this is because when you are reading about unit testing or listening to advice about unit testing you may hear very different comments depending on the assumptions the writer or speaker has about these words, unit and test.

Are unit testing and test-driven development the same thing?

The short answer is “No.” A longer answer would distinguish between the activities of building up the code through emergent design and checking that the code does what we think it should do under controlled conditions.

In my experience, there’s considerable overlap between those two activities. I find myself using unit testing tools and techniques both when developing/designing/coding and when checking the behavior of existing code. I also find that the unit test suite that emerges through test-driven development also serves as a regression test suite after the fact. This makes it difficult to draw a hard line between unit testing as a development/design practice and unit testing as a testing or checking practice.

If you have always worked in a mainframe environment, it’s unlikely you have practiced TDD in the “true” sense, unless you have developed new applications in Java to run on an LPAR configured with Linux on Z. On the “MVS side of the box,” where procedural languages like Cobol and PL/I live, the tooling to support “true” TDD has not existed. Even when you use zUnit with Rational Developer, you are not able to get down to the fine-grained level TDD calls for.

When you read/hear about issues with TDD, a common complaint is that when the programmer changes the code under test, the change breaks his/her unit tests. That type of comment is a red flag. It is literally impossible, by definition, to break a test by changing production code when you use TDD.

Why? Because the thing that drives goes first, and the thing that is driven is dragged behind it. If you are test-driving your code, you write or modify the test cases first, and then modify the production code to make the cases pass. You cannot break a test case by modifying production code, because by definition you do not modify the production code before you have set up your test cases to reflect the system behavior you desire.

Anyway, whether you are using a unit test case to help you drive development or to help you check the behavior of existing code, it’s the same test case. This seems to cause confusion and leads to endless circular debates online.

Who is responsible for writing and maintaining unit tests?

IBM mainframes are generally used by very large organizations. Large organizations usually divide up the work into functional silos, such that one group of people writes application code and a separate group of people tests the application.

In some organizations, the work is chopped up into even more pieces than that. I have seen companies where a separate team writes all the JCL and none of the code, while developers write only code and no JCL; a situation that seems untenable, as it’s rather difficult to run your code without writing JCL to run it. It’s like saying, on a Linux or Unix system, one person can write a program but a different person must enter the command to execute it; or on a Windows system, one person can write a program but a different person must click the icon to execute it.

In reality, the programmers do write JCL and they do test their code as best they can. They don’t want the overhead of several back-and-forth rounds to communicate what the correct JCL should be. They don’t want the embarrassment of having a tester discover a stupid mistake in the code they should never have handed off in the first place.

But it is what it is. The situation leads to confusion about who is responsible for writing tests. It goes back to the problem of referring to checking as testing. One way some people deal with the inconsistent terminology is to call unit tests programmer tests. The implication is these tests are the responsibility of the people who write code, as opposed to other forms of testing that may be relegated to a separate group or team.

Unit tests as I mean it refers to programmer-written, fine-grained functional checks. The programmer or “developer” is responsible for these tests. Writing them is part of the development process, not part of the formal testing process that may include end-to-end testing and system testing. The unit tests are very small in scope and they exercise chunks of code in isolation from the rest of the solution. They do become part of the overall set of test suites for the application, and that may account for some of the confusion, as well.

What’s the value of “automated testing” vs. “manual testing?”

This question comes up frequently, and it is another artifact of the inconsistent terminology we use. Usually when people say “manual testing” what they mean is functional checking, but carried out by “manually” setting up test data and test scripts and stepping through them, rather than by writing executable test cases. When they say “automated” testing, what they mean is functional checking in the form of executable test cases, which may or may not actually run automatically when changes are committed to version control. So, there is a certain lack of precision in the way we use these terms.

When humans test software, they may interact with the software through its user interface or API, and they may use “automated” testing tools, and they may even train a machine learning model to help them explore the potential behaviors of the system under test. They do all kinds of different things. The term “manual testing” doesn’t adequately describe all this.

By the same token, the term “automated testing” doesn’t always mean what it implies – that executable test cases are initiated automatically, without direct human intervention, when a triggering event occurs. The triggering event is usually to commit code changes to the version control system, which is monitored by a continuous integration server.

Both terms – automated and manual testing – tend to be used in a casual and imprecise way. Be aware of that as you seek information about these practices. For what it’s worth, nearly all testing I have seen in mainframe environments has been of the “manual” variety (bearing in mind the loose definition of that term).

In the context of the question about the value of unit testing, I would say the unit test cases will be far more valuable when they are incorporated into a continuous delivery pipeline than when they are executed manually on an ad hoc basis.

Frequency, cost, and risk of change

When you read/hear about people’s preferred software development practices, you will find they often express themselves in absolute terms. You should always write unit tests. You should never write unit tests. You should always use mocks and stubs. You should never use mocks and stubs.

Working on the back-end on a z/OS system, you have an additional consideration beyond what front-end developers usually think about, to determine whether you should write unit tests. The outer layers of code – what the kids call the “full stack,” although it excludes the mainframe platform – tend to have the following general characteristics:

They are “windows” into the systems of record that access core data via APIs; they are not, themselves, systems of record;
They tend to change frequently, to provide customers with an enhanced user experience;
The risk of change is primarily the risk of crashing the production environment; as they are not systems of record, the risk is not (usually) that core data will be corrupted; and
The time required to modify code and test it within reason is shorter than the time required on the back end (mainframe).

In contrast, the applications that live on the back end tend to have the following general characteristics:

They are the systems of record for the enterprise;
They tend to change infrequently, as they have been stable for many years;
The risk of change is primarily the risk to core data; the risk of crashing the mainframe is very low; and
The time required to modify code and test it within reason is usually relatively long, primarily because of the numerous interconnections between systems and data flows in the back end.

Of course, these are generalizations. Many large companies have no mainframe systems at all, and no legacy code that goes back deep into the previous millennium. But this article pertains to organizations that have long-lived mainframe applications. In those cases, the systems of record are not in the front end.

Frequency of change is a consideration for deciding whether it’s worth the effort to write automated unit tests. When your team is deploying changes to production every week or two, as is typical of teams using a process like Scrum or Extreme Programming, you simply don’t have time to repeat all the necessary testing in a “manual” way before every deploy.

Your back end application may change only a couple of times per year. Does that mean you don’t need unit tests? It depends. If you must change the application, it’s because of some significant business reason.

In the financial sector, for instance, you might have to implement support for a regulatory change. You don’t have six months to test the change. Regulators will not care if you say the change is “dev complete” and you’ve handed it off to the testing team.

So, even if the frequency of change is low, the need to get the change through the delivery process quickly may be high.

Risk is another factor to consider. If the application you support is core to the business, an error may have high costs. When changes are infrequent, this may be an even more important factor, as people will forget details about the application and how to test it properly in the long intervals between changes.

Even if your group will never adopt a one or two week delivery cycle, there may be very good reasons to consider writing executable unit tests for the application.

How does the type system of a programming language affect the necessity or value of unit tests?

This topic may not have come up in your internal discussions about unit testing mainframe applications, as the traditional mainframe languages are all similar with respect to type systems.

People who enjoy discussing the technical details of programming languages may take issue with the simplified taxonomy of type systems that I am about to mention, but for purposes of this article I think it is sufficient.

Some languages have strong typing. Once you have defined types correctly, you can depend on them to protect you from many kinds of programming errors. Code that uses a type inappropriately will not compile. The implication is you need not write explicit unit test cases to cover those situations. Haskell is an example of a language with strong typing.

Some languages have static typing. Once you have defined a class with certain attributes or a variable of a predefined type, you can depend on the definition to protect you from certain kinds of runtime problems. Code that uses a class or an object based on that class inappropriately may not compile. The implication is you need to write more unit test cases to cover all necessary scenarios than you do when using a strongly-typed language. Java is an example of a language with static typing.

Some languages have dynamic typing. When you declare a variable, the language does not know what type of data it may hold. At run time, either the runtime environment or code generated by the compiler will “guess” the data type of the variable based on the values your program tries to assign to it. The implication is you need to write more unit test cases to cover all necessary scenarios than you do when using a statically-typed language. JavaScript is an example of a language with dynamic typing.

Traditional mainframe programming languages have no type system at all. This may seem an odd comment at first glance. What I mean is that variable declarations in, say, Cobol, give the compiler information about what sort of object code to generate, but they do not prevent the “wrong” type of data from finding its way into those variables at run time. “Types” in this environment refer to data types, not “object” types. The compiler can check for appropriate Move statements to a limited extent, but not completely.

For instance, when the Cobol compiler sees a Data Division item that has a numeric Picture clause and a Usage clause of COMP-3, it generates object code that assumes that item will contain packed decimal data. But the generated code does not guarantee the item cannot receive data that is not in the packed decimal format. The “variables” exist back-to-back in contiguous virtual memory addresses. They are not separate “things” in memory.

Data item names are shorthand for an offset and length from some starting address, such as the start of Working-Storage, Local-Storage, or Linkage. They are not “managed” at run time as separate variables that have a specific type. The implication is you need to write even more unit test cases to cover all necessary scenarios than you do when using a dynamically-typed language.

Cobol variables are really fields – a range of virtual storage addresses, and not objects managed at run time. At the same time, Java fields are really variables (or instance members) – objects that are managed at runtime by the JVM, and not merely a range of memory addresses. So, we have some ambiguity with terms like variable and field.

It’s useful to understand what is really meant so we can determine how much effort will be required to maintain a suite of unit tests.

When should we use example-based tests vs. property-based tests?

Unit tests are really “checks,” as mentioned previously. There are a couple of flavors of executable checks that developers use: example-based and property-based. Example-based checks (or test cases, if you will) are used far more frequently than property-based checks.

With PBT, we define characteristics of the code under test and allow a tool to generate test data based on those characteristics. It can be very useful for exposing edge cases that we overlooked in our design.

As property-based testing (PBT) has gained popularity and tool support has improved, developers have started to learn PBT and some have come to favor PBT over example-based testing. Many of us use both according to our best judgment.

There are debates about which is better, if indeed there is a single approach that is universally “better.” In the z/OS environment, these debates are moot as there are no tools to support PBT. As a practical matter, you will use example-based unit checks to support the existing applications in traditional languages. But it may be something to think about for future reference.

What are some practical guidelines for the design of unit tests?

Many of the issues people write about and speak about with respect to unit tests come down to problems with the design and scope of the test cases. It’s very common for people to create test cases that are far too large to function effectively as unit checks.

Often they feel there is no choice, as they must cope with monolithically-designed code that is tightly coupled to external dependencies. They may not know how to refactor the code, or they may feel refactoring is too time-consuming to be useful, or they may simply be reluctant to try refactoring because it is unfamiliar to them. They become frustrated and dismiss unit testing categorically.

This topic is very relevant to the z/OS environment. In supporting existing mainframe applications that may be decades old, you will find monolithically-designed programs that are tightly coupled to external dependencies every day, especially on the batch side. Unless you restructure some of that code to enable smaller parts of it to be executed in isolation, it will be a hassle to write and maintain a suite of unit tests.

In the section above entitled What is the scope of a unit test?, I listed some references from GeePaw Hill and Michael Feathers. Those references and other material online will help you see how to craft unit test cases that are well isolated and zero in on specific behaviors of the code under test at a fine level of detail.

Working with languages like Cobol, you will find the large, monolithic programs are difficult to break apart to enable unit testing. Here is a write-up with some general suggestions, and a Cobol example: Refactoring for Testability.

How does all this apply to “legacy” code running on z/OS?

IBM’s ongoing “modernization” program for mainframe technologies has resulted in the emergence of integrated development environments, testing frameworks, and other facilities similar to those used to support other languages and platforms. Support for executable tests is available from IBM and other mainframe-focused software companies, even if their concept of a “unit” doesn’t quite satisfy me or others like me.

I know of at least four companies offering Windows-based mainframe-savvy development environments built on the Eclipse IDE. Two IBM Cobol editors now support basic refactoring operations for that language. I’m trying to enable fine-grained microtesting via the cobol-check open source project. Things are moving along well with respect to tools.

The practices that employ those tools are still pretty new to mainframers. It’s still common for code to remain checked out of version control for weeks at a time, and for different individuals to modify the same parts of an application separately and then try to merge their changes later, at the cost of some pain. It’s still common for enterprises to rely mainly on “manual” testing, even for mission-critical core applications that require substantial regression testing prior to releases. It’s still common for any single piece of work to cross the boundaries of several siloed teams before it can be deemed complete. It’s still common for complicated deploys to be done manually, with a high probability of human error.

It’s still common for “agile” consultants and coaches to ignore the mainframe environment simply because it is unfamiliar to them, and not because enterprises would not benefit from improving agility in that area. But what’s the use of having only the outer layers operating in an “agile” fashion, when the mission-critical core applications are not supported in the same way? We like to talk about delivering “vertical slices” of functionality, but how delicious is a slice of cake that only penetrates the icing on top, and doesn’t go all the way down to the plate?

Microtests, test-driven development, and incremental refactoring are three pieces of the puzzle that are still missing from the mainframe picture. It’s time to move forward.