The domino effect of technical debt

Most people who do coaching work in the IT space focus on one of three areas: (a) technical practices, (b) process improvement, and (c) organizational dynamics or “human factors.” It’s not unusual for a person to have skills in both process improvement and organizational dynamics. It is rare for the same person to work at both the level of technical practices and the level of process improvement, and even more rare for that person to be engaged for both purposes at the same time.

I’ve observed a sort of disconnection between process improvement initiatives at the organizational level and improvements in technical practices at the team level. I don’t know if the reason for this is, in part, the separation, first in formal education in universities and technical schools, and later in coaching, consulting, and training services, between technical practices on the one hand and process improvement and organizational dynamics on the other. Whatever the cause or causes, the situation appears to be that improvements in technical practices and improvements in process are treated as separate and disconnected issues.

I think the two are connected. Technical debt is more than merely an annoyance for maintenance programmers who have to deal with a challenging code base. A mass of tightly-coupled code can make it very difficult, time-consuming, and expensive to implement general improvements.

Here’s just one example: You’re probably familiar with the idea of service oriented architecture, or SOA. One of the enablers for this approach is to have a clearly-defined boundary between shared services and client applications. When the production code in the organization is based on the industry standard application architecture, and nearly every interface is a custom-built, point-to-point interface hastily created under delivery pressure on an ad hoc basis just at the moment a developer realized there was no usable interface for the bit of code he/she was writing, establishing that boundary is easier said than done. It’s easier said than done even in the best case, but in the typical case it’s all but impossible. That which makes it all but impossible: Technical debt.

Other sorts of intended improvements sometimes run afoul of technical debt, or sometimes encourage technical debt. For example, it makes sense to focus the creative energies of the technical staff on work that provides a competitive advantage to the company, and to use external services or COTS packages for back-office functions and commodity assets. Many organizations rely on COTS packages for shared IT assets such as business rules engines, data warehouses, ETL facilities, ERP systems, CRM systems, database management systems, content management systems, image capture systems, groupware, and many more. The intent is to minimize support costs and improve flexibility to address changing business needs, as well as to keep highly skilled technical staff available to work on business applications that provide a competitive advantage to the company.

Because so many “enterprise” products are designed monolithically, when key business applications have dependencies on one or more COTS packages, the overall effect is to extend lead times and increase the difficulty of delivering new solutions and enhancements. The problem is magnified when multiple applications have dependencies on the same shared assets. The release schedule for all the dependent client applications and all the asset upgrades turns out to be determined by the longest necessary lead time for any of the shared asset upgrades.

In order to satisfy time-to-market needs, many application teams resort to building their own one-off work-arounds to avoid including any COTS configuration changes, custom plug-ins or user exit code, or version upgrades in the scope of their own projects. They just can’t afford the delay. Strategic value gives way to tactical necessities. Typically, this happens again and again and again for years on end.

The organization ends up with a hodge-podge of implementations, some of which use the COTS packages as intended, some of which go around the enterprise assets and roll their own redundant solutions, and some of which are a patchwork of both. Sooner or later, one of those COTS packages has to be upgraded or replaced. Only then do people discover it will not be a plug-and-play exercise, thanks to all the undocumented hard-coded dependencies between the client applications and the back-end services, and hard-coded dependencies between individual applications that were supposed to have routed requests through the service bus, but didn’t.

The effect of well-intentioned management decisions, policies, and spending guidelines, coupled with significant technical debt, can lead to painful, vicious cycles. Here are two examples from my own experience. One comes from the 1978-79 time frame and the other from the 2010-11 time frame, suggesting that things are no better today than they ever were…and things were never good.

Example 1: 50,000+ employee company, 1978-79

The company had a massive IT infrastructure with thousands of applications in production. This story concerns a single internal business application. It is representative of the rest of the technical environment.

Given:

Back-end is an IBM IMS/DB/DC application hosted on IBM mainframes.
Front-end is a minicomputer application running at thousands of sites in 48 countries.
Technical standard: No input validation logic is permitted in back-end code. Front-end logic is expected to assure only clean data transmitted to the back-end. Rationale: Avoid redundant processing, enhance performance.

[1] Cost-centric as opposed to value-centric mentality, leading to

[2] maintenance of very old technologies until they literally fall apart, leading to

[3] rusty old transatlantic cable corrupting messages between front-end and back-end, leading to

[4] exceptions thrown by back-end mainframe during nightly batch processing, leading to

[5] calls to application support team in the middle of the night, leading to

[6] defensive programming on the mainframe side to prevent support calls, carried out by sleepy, pissed-off people in a hurry to get back to sleep, leading to

[7a] code to avoid exceptions located in parts of the code invoked after the original problem occurred, and

[7b] ad hoc methods of logging or otherwise dealing with the corrupt data, leading to

[8] fewer late-night support calls at the cost of

[9] increased technical debt, leading to

[10] arbitrarily complicated code that is hard to understand, leading to

[11] increased support costs for the application, leading to

[12] desire to re-write the application cleanly, leading to

[13] analysis of code, leading to

[14] conclusion that the code is too complicated to re-write quickly and cheaply, leading back to item [9]…reinforcing loop.

Example 2: 100,000+ employee company, 2010-11

The company is the result of a long series of mergers and acquisitions. Every IT asset, business application, system platform, and development group they acquired is still in there, somewhere.

[1] Siloed organization mirroring leftover fiefdoms from prior acquisitions, leading to

[2a] lack of coherent service layers, responsibilities scattered across apps & network layers, and

[2b] lack of common technical standards and disciplined use of development practices to control technical debt, and

[2c] isolated working groups based on skill sets around the different technologies inherited in the acquisitions, and

[2d] sense of separate identities carried over from previous companies, leading to

[3] poor coordination of design through all architectural layers and across the enterprise, leading to

[4] work-arounds for front-end components to obtain all necessary data from calls to back-end components, leading to

[5] very large response messages sent back to front-end components, leading to

[6] increased demand for memory on back-end servers, leading to

[7] the need to upgrade 32 bit servers to 64 bit systems, leading to

[8] the need for comprehensive regression testing of server code to support an upgrade, but

[9a] lack of repeatable manual test scripts and lack of automated test scripts, and

[9b] lack of any single testing group with overarching responsibility for full system testing, leads to

[10] no practical way to carry out comprehensive regression testing of server code in a reasonable time frame, leading to

[11] fear of risk in upgrading back-end servers, leading to

[12] delay in upgrading back-end servers, leading to

[13] degraded customer service, leading to

[14] customer exodus, leading to

[15] reduced revenue, leading to

[16] short-term quick fixes to try and lure customers back, leading to

[17] less funding available to upgrade back-end servers, leading back to item [12]…reinforcing loop.

1 thought on “The domino effect of technical debt”

cnutsr
March 13, 2012

This school of thought is viral in most of the sectors in the industry, lack of systems thinking approach is the big gap, connecting the required things and finding the right leverages is the way forward 🙂

Comments are closed.