Posted on

Accuracy, performance, and maintainability of software systems

Premise: The relative importance of three key qualities of software systems – accuracy, performance, and maintainability – depends on context.

Problem: Many (most?) software developers have a rigid opinion regarding the relative importance of certain characteristics of software systems. Some will insist that performance is always a priority, and that all types of software must execute as fast as is technically possible. Others will insist a system must always produce accurate results, and there is never a margin for error. Others will demand that any software system be designed with maintainability in mind, regardless of context.

Solution: Understand the factors in a system’s operational context that influence the relative importance of accuracy, performance, and maintainability. Apply this understanding when making design choices.

To illustrate how context influences the relative importance of these qualities of a system, here are a few archetypical or actual examples of different contexts in which software is used.

1. NASA Space Transportation System (Space Shuttle) avionics

This example illustrates a context in which accurate results need not be produced from every execution of critical routines, even though lives are at stake.

I may have some of the details wrong, as I did not work on this system myself, but I think it’s accurate enough to make the point. The following is based on information I have read and heard.

During reentry, the shuttle was subjected to powerful buffeting as it encountered resistance from the atmosphere. To compensate for these forces, the control surfaces of the aircraft had to be adjusted very rapidly. Micro-adjustments were made approximately 1,000 times per second.

That rate is too fast for a human pilot to control manually. The avionics software calculated each adjustment based on input from sensors and made the necessary adjustments to the control surfaces. One challenge in this solution was the capacity of the computers of that era. To run an algorithm that could guarantee a correct result every time would have required too much execution time. The response to forces acting on the aircraft would have been late, and the aircraft would have spun out of control.

To achieve the necessary processing speed, programmers used an algorithm that was not 100% accurate, but was much faster than a completely accurate one. Running on each of the four main computers, the algorithm could run fast enough to compare the answers from the different computers and discard results from one computer that did not match those from the others. When the result was incorrect, the next pass through the code happened so fast that the aircraft corrected itself.

The programmers had to work within severe memory and processor constraints. Prior to the 1991 upgrade, they were working with a memory limit of well under one megabyte. The 1991 upgrade gave them a full megabyte of memory.
Due to the memory constraint, the code was designed as a series of overlays. The code to manage lift-off was overlaid by the code to manage on-orbit operations, which was then overlaid by the code to manage reentry and landing. This sort of design was made necessary by the realities of the context, and did not allow for easy-to-read, easy-to-maintain source code. In fact, any change to the avionics software had to pass through nine months of testing in a simulator before it could be used on a spacecraft. It was not a system that required frequent modification.

In this context, fast execution time was of the essence and human lives were at stake if the code was not fast enough. The priorities were:

  1. Performance
  2. Accuracy
  3. Maintainability

2. Online banking applications

This example illustrates a context in which accurate results must be produced every time even if this means the customer will experience a noticeable delay in perceived response time.

This kind of application has a few general characteristics that influence our design choices:

  • It carries out financial transactions over a large, public network
  • Typically, any single transaction passes through several applications on several platforms located in various locales; it isn’t a simple in-and-out
  • Any errors in the course of the transaction must cause everything to roll back to the last known consistent state; no excuses
  • It has to be secure
  • It has to comply with government regulations
  • It has to generate the correct results every time; no excuses
  • Mistakes handling other people’s money can lead to legal action, loss of public confidence, loss of market share, penalties and fines, and lost revenue
  • By far the largest component of perceived response time is network delay that is beyond the control of the application code or the institution’s own information systems; hand-optimizing the code is a pointless exercise, as compute-time overhead is lost in network delay
  • The financial services market is dynamic, and software applications that work in that space must be modified frequently

Based on these considerations, the priorities are:

  1. Accuracy
  2. Maintainability
  3. Performance

3. Market-sensitive, innovative apps offered on a Continuous Beta or Lean Startup basis

This example illustrates a context in which speed-to-market based on customer feedback is crucial to meet business goals.

Characteristics:

  • User experience is adjusted frequently based on feedback from real customers
  • New and modified features are pushed to market frequently; there is no “end date” for development
  • The software may be re-architected, re-platformed, and re-designed many times during its lifetime
  • An enjoyable or trendy user experience is (or may be) more important than perfect functionality
  • Users tend to be tolerant of minor errors as long as they feel they are on the leading edge

Based on these considerations, the priorities are:

  1. Maintainability
  2. Performance
  3. Accuracy

Conclusions

  1. When the delay involved in guaranteeing a correct answer will result in loss of life, performance may trump accuracy.
  2. When an incorrect answer will result in financial damage to customers, fines, and loss of market share, accuracy may trump performance.
  3. When a delay in delivering new features to the market will result in loss of revenue, maintainability may trump performance.