Posted on

Cultural Divide and Implications for Tools and Practices: Mainframe vs. Distributed

I began my career working on IBM mainframes in 1977, moved away from that platform gradually from about 1987 to about 1994, and have recently begun to return “home” again (also gradually). I have been heavily influenced by some of the advancements in software development in the world of Java, C#, and similar languages.

As I work more closely with mainframers and with vendors and open source developers who focus on the IBM mainframe platform, I’m confronted with cultural differences between those who work on mainframes and those who work, as mainframers put it, in the “distributed” world (that is, everything that isn’t a mainframe).

The mainframe world has experienced continuous evolution in technology but has been largely untouched by advancements in development practices, such as Extreme Programming. People can recite the buzzwords, but their understanding of contemporary development practices is on the level of business suits and slide decks (for the most part).

Of course, individual perspectives and experiences vary widely; but on a general level, people with a background in one of these two areas share assumptions that are different from the shared assumptions in the other area. I’m finding these differences in assumptions lead to differences in the way development and testing tools are designed.

Different business risks changing code

Some of the differences arise from real-world considerations. Most of the still-extant mainframe-based applications are mission-critical, market-differentiating systems for the companies that own them. They are systems of record. They contain baked-in business rules that may not be well-documented outside the source code itself.

Data stores that reside on the mainframe platform are the ones hackers are targeting to collect information about customers. They are where your financial information lives, even as you use your smart phone to carry out banking transactions.

There are business risks associated with modifying those applications that don’t apply to most other kinds of applications. As a result, organizations that support them have implemented procedures that would seem unnecessarily “heavy” in other environments.

Different lines of evolution

Some of the differences arise from the fact we have had two separate lines of development in computing since approximately the 1960s. This is a generalization, of course, but on the one hand there is the evolutionary path of business-focused computer systems, involving companies like Amdahl, Fujitsu, Hitachi, Digital Equipment Corporation, Tandem Computers, Wang, Honeywell, and many others. That segment eventually shook out most of the competitors, leaving IBM as the dominant player.

One of the core ideas has been that a central computing system handles all workloads for client systems and users.

On the other hand, there’s the Unix path of evolution in computing. I include the various PC DOS implementations and the lineage that led to Microsoft Windows in this category, as well as Unix and Linux. This was used more in universities and research organizations than large business enterprises, and took a different path than the business-focused mainframe and midrange systems (Windows being a bit of an exception, as it has been a mainstay in enterprises).

The two worlds did interact, but not deeply. People in each ecosystem invented or discovered many of the same things, and called them by different names. For instance, what Sun Microsystems called a “thread” already existed on IBM mainframes, and was called a “subtask.” There are many other such examples, in which terminology differs but the underlying concepts are essentially the same.

One of the core ideas, captured succinctly in the Sun Microsystems motto, has been “the network is the computer” (coined by John Gage). This is consistent with a tenet of the “Unix philosophy,” that any given component of a system should do one thing, and do it well. This created a different worldview and resulted in different outcomes, large and small, than the philosophy that the central system should manage “all” things.

The difference between the idea of a central computer and the idea of a network has resulted in a mainframe platform, now called IBM Z, that is far more complicated than any single Unix, Linux, or Windows node on a network. This is a characteristic of mainframes that people find challenging when they begin working on this platform after a career in the “distributed” world. They expect to find “just another” thing like Unix or Linux or Windows, maybe running an unfamiliar operating system but otherwise pretty similar.

Instead they find something quite different. Rather than a single network node that does one thing and does it well, they find (in effect) a network in a box – a system that can run multiple different operating systems concurrently, balance disparate workloads dynamically, host massive amounts of data, and manage thousands of VMs.

Conventional wisdom in the “distributed” world has been that a large, monolithic system like a mainframe was an obsolete concept, and would be replaced by networks of smaller systems that were individually simpler than a mainframe, but collectively more capable. To an extent, that approach offered small and mid-sized enterprises a more cost-effective option for business computing than a mainframe system, at the cost of somewhat more tedious configuration and operations activities. But increasing scale was accompanied by increasing complexity.

Interestingly, the sheer complexity of contemporary cloud infrastructures based on a network of systems designed with the Unix philosophy has opened the door for a second life for mainframes. Any recent version of an IBM Z system already has all the functions cloud implementers discovered were necessary to support such a facility. This is one of the reasons for the current resurgence in interest in mainframe technology. It is a natural cloud infrastructure.

Today, the lines between hardware and software are somewhat blurred. We are accustomed to working with a set of virtual machines and/or containers. Where do those components live in the physical world? It almost doesn’t matter. Whether a service is hosted on AWS, PCF, Azure, or a mainframe makes no difference to a client application or human user. To a client application, it’s just an API call. To a human user, it’s just a UI.

Different “direction” of improvements

The idea of “agile software development” gained traction after the publication of the Agile Manifesto in 2001, but many of the underlying ideas were being explored in the software industry well before then. Improvement in software development practices, trailed by improvements in tooling, began with programming, soon extended into testing, and later spread forward along the delivery process, leading to the idea of “devops” and eventually to the idea of active monitoring of operations based on observability.

In the mainframe world, the idea of “devops” was the first of these concepts to gain traction. In a sense, then, improvement began on the operations end of the delivery pipeline and grew “backward” toward the development end; the opposite of the “distributed” world. As I write this in January, 2021, development practices that have been part of Extreme Programming and similar methods are only now starting to appear on the radar of mainframe programmers. For context: These practices are 30 years old.

Tool improvements have also proceeded in the reverse order in the two “worlds.” In the distributed world, developers demanded tool support for development practices they were already trying to apply as best they could. For instance, continuous integration was originally a work flow practice used by Extreme Programming teams; later, tools were developed to make this easier. Today, continuous integration servers are commonplace and considered a basic part of any development setup.

Tooling for other development practices has followed the same general pattern – practice first, then tool. The pattern applies to refactoring, property-based testing, mutation testing, behavior-driven development, and other practices.

In the mainframe world, the tendency has been for tools to be introduced first, and practices to be adopted after tooling was in place to support them. As part of IBM’s modernization program, that company and others have developed development tools that enable many good practices. Mainframe programmers are being introduced to contemporary development practices through these tools (in part). The usual pattern is tool first, then practice.

This difference in “direction” has affected the way in which practices are adopted and applied in the two worlds. As tooling follows developer demand in the distributed world, tools tend to support the latest (hopefully, best) practices the development community has discovered to date. As tooling has come first in the mainframe world, the tools tend to accommodate existing development practices, rather than encouraging improved practices. The designers of the tools have no background in contemporary development practices (for the most part). So, they have desiged tools that make current work flows better, but that don’t necessarily enable more-contemporary work flows.

Tools and Assumptions

As this is written, several developer stacks are available to support work on mainframe code.

The first of these is the traditional stack with no off-platform components. The rest provide a rich developer environment as a front-end to a mainframe system. They include components installed both on the user’s workstation and on the z/OS system, and they assume all work will be done with a live connection between the two. All these tools were developed with certain assumptions in mind:

  • It’s best to build unit testing tools for Cobol based on existing models that are proven in the field, such as xUnit
  • Programmers want and need to be connected to the back-end mainframe system at all times during coding and unit testing (the idea of disconnected development was outside the designers’ experience, and was not considered – all these stacks have components that must be installed on the mainframe side)
  • A “unit” of code is equivalent to a whole executable or load module (no concept of fine-grained microtests)
  • It’s best to generate unit test cases from existing code (no awareness of test-first development)

Initial releases of development tools for mainframe Cobol did not include refactoring support. This is changing. IBM has implemented refactoring support in two of its enterprise text editors, and Micro Focus supports refactoring of Cobol code. Refactoring support will soon be a baseline expectation of developer tools for mainframe work.

The other assumptions are still current and may be problematic.

OO-based testing tool for a procedural language

The xUnit architecture for unit testing tools comes from Object-Oriented Programming. Tools based on this model are pretty intuitive for OO programmers working in languages like Java, C#, Python, and so forth. The problem is that Cobol is not an OO language (notwithstanding support for some OO features, which are rarely found in existing “legacy” code). Programmers accustomed to working in a procedural language like Cobol tend to struggle with a tool based on OO thinking.

An argument in favor of the xUnit approach for Cobol is that the last generation of experienced Cobol programmers is retiring. Their replacements come from a younger age group who have experience with other languages and platforms. For them, an xUnit-based tool is likely to be easy to pick up.

I appreciate that argument, but there is room for disagreement. It seems to me any person who is working in a procedural language will be able to work more fluidly with a testing tool that “feels” procedural. It doesn’t matter what they were working on in their previous job. The OO model just doesn’t fit Cobol. The fact the new developer is younger than the individual they’re replacing doesn’t change the intrinsic characteristics of Cobol.

Continuous connection to a mainframe

Even without the remote work situation imposed by the Coronavirus, many software developers like to work on their local laptops without any connection to external servers. They work at home, on trains, on airplanes, at coffee shops, at parks, at the beach, or wherever.

It’s very likely remote work will continue to be common after Coronavirus has run its course. Many people have grown accustomed to it and like it. At the same time, many employers have found it convenient and cost-effective for them, as well, particularly as developers need not live in the same city or even in the same country. Employers can shop for talent. Yes, you can connect from anywhere in the world, but why create the additional points of failure and the additional security exposure when it is unnecessary to do so?

If you visualize the canonical “test automation pyramid,” you can see that the base of the pyramid accounts for the majority of work. That’s the “unit” level, or microtest level.

Per the generally-accepted notion of a “unit test” (in the distributed world), these test cases never touch files or other external resources. This means there is no hard necessity to be connected to the mainframe for (probably) 70% to 90% of development work.

Developers need to upload their code, compile it and run it on-platform only when they are ready for integration-level testing. The ability to work off-platform offers considerable flexibility and reduces the number of potential points of failure in the overall development setup.

This concept seems to be very challenging for experienced mainframers to grasp – including those who design and build development tools. In my conversations with them to date, they have reacted as if I were speaking Martian when I talked about doing Cobol development while disconnected from the host system.

It might be helpful to compare the situation to another, possibly more-familiar one: Embedded system development. Most of the microtest-level development can occur on a standard laptop or desktop or a VM, without burning any code. When the code reaches a point where it can’t be verified any further without loading it onto a piece of hardware, then that’s the point when we do so.

Mainframe development can be handled in much the same way. As we spend most of our development time at the microtest level, we don’t need a live mainframe connection most of the time. When we reach a point where we need to run the code against mainframe-hosted resources, we upload it.

Scope of unit tests or microtests

Most of the tool stacks available today include some sort of test automation framework. None that I have seen enable developers to write fine-grained microtests. The smallest “unit” of code the tools can support is an entire executable.

The idea of being able to exercise a single Cobol paragraph in isolation, which would be comparable to exercising a single Java method in isolation, is almost incomprehensible to most mainframers. Indeed, the creators of all these various tools in all these various companies and open source projects have never thought about narrowing down the scope of a unit test case to a single paragraph.

In addition, the idea of a “meaningful” test case that doesn’t touch any real files seems quite unusual to most mainframers. They are not in the same place conceptually as developers in the distributed world.

That said, there is evidence that Cobol programmers want a tool that can exercise code at a fine-grained level. A company in Denmark has forked the proof-of-concept project, cobol-unit-test, made a number of improvements to it, and is using it in their CI/CD pipeline. Their fork is at https://github.com/Rune-Christensen/cobol-unit-test. In addition, my experiences and those of several colleagues who have introduced that proof-of-concept to clients has been that Cobol programmers are happy to be able to write microtests, and eager for a tool that supports it properly (unlike a proof-of-concept).

Test-after approach baked into the tools

Another problematic assumption has to do with when microtests or unit-level examples are created. Every testing tool I’ve seen (so far) for mainframe developers assumes they will write code first, and then use a tool to generate test cases.

Bearing in mind automated testing is a relatively new practice in the mainframe world, and nearly all development work in that environment involves supporting existing code, it makes sense for testing tools to work in this way.

The problem isn’t that the testing tools can generate test cases; the problem is that they don’t make it easy for programmers to write the test cases first and use them to guide development. A second problem with generating test cases is that the cases will only be able to tell us whether the code “does what it does.” Test cases generated after the fact can’t help us discover whether the code does what it is supposed to do. That is especially important when we modify an existing program of substantial size – quite common in this environment – and we need to avoid creating regressions.

In reality, the majority of business application development consists of supporting existing application code, whether on the mainframe platform or not. In the distributed world, developers approach “legacy” code using certain techniques, such as Approval Testing and Characterization Testing to obtain a baseline understanding of the current behavior of existing code. Going forward from there, developers generally apply test-driven development with incremental refactoring. Tools that make this approach tedious will discourage developers from using good practices.

This IBM Developer article (https://developer.ibm.com/devpractices/software-development/articles/5-steps-of-test-driven-development/ shows that IBM understands this, even if they haven’t extended the idea to Cobol (yet). Perhaps their left hand should talk to their right had occasionally.

Filling a Gap

The nascent open source project, Cobol Check, aims to fill a gap in tooling for Cobol developers. It is based on the proof-of-concept project, cobol-unit-test, but as of January 2021 it does not support all the functionality of the proof-of-concept. It’s now part of the umbrella project, Open Mainframe Project, and is under intense development.

The two key characteristics of this tool that promise to make it a useful complement to a Cobol development stack are:

  • it enables fine-grained microtesting of individual Cobol paragraphs; and
  • it does not require any connection to a mainframe during development.

The design goals of the project are laid out here: https://github.com/neopragma/cobol-check/wiki/Design-Goals.