Slack, Flow, and Continuous Improvement

One of the key ways to keep work moving forward is to avoid working on too many things at the same time. Ideally, a person should finish what they’re working on before starting anything else. Similarly, a team should complete the work item or ticket or story (or whatever they call it) they’re working on before picking up the next one. At a larger scale, a software delivery organization should limit the number of projects in flight concurrently, and strive to “stop starting and start finishing,” as David Anderson put it. That’s what portfolio management is for (among other things).

Coming at the problem from a variety of directions, many people have observed that we can finish a list of tasks in less overall time if we complete them one by one than if we start many tasks and try to juggle them. Your friendly neighborhood agile coach has probably facilitated games with your team in which you experience this effect as you fold paper airplanes, build things out of interlocking plastic blocks, write down columns of letters and numbers, or place dots on sticky notes, first with context-switching and multiple items in progress, and then with single-piece flow.

In Lean terms, what you’re experiencing in those games (and possibly in your real work, too) is a high level of work-in-process, or WIP. Controlling WIP is the primary “knob” we can turn to manage flow in a process. Whether the process itself is inherently efficient or not, limiting WIP is an effective way to get the most out of it and to help expose parts of the process that impede flow. It’s really easy to do. We just have to say, “We’re going to start no more than two items at a time.” There. We just set a WIP limit of 2. No stress, no strain.

Context-switching overhead

One reason for the debilitating effects of high WIP, particularly when creative knowledge work is involved, is context-switching overhead. When we’re deeply involved in our work, we get into a state of flow; not Lean-style flow, but Czikszentmihalyi-style flow. Some people call it “being in the zone.” We’re holding a lot of details in our minds, and the problem we’re working on feels a bit like a living entity.

When interrupted, those details escape and we lose focus; the entity dies. When we return to the first task, it takes us a few minutes to get back to the state we were in at the time we were interrupted. Many software developers will tell you it takes between 10 and 20 minutes to get back into the zone, depending on the task. When we’re juggling more than, say, two items in this way, we can lose significant time as we try to remember where we left off with each task in the rotation. Of course, that will affect Lean-style flow, too.

Some researchers have found that people seem to operate at an optimal level when they’re juggling two tasks. With a single task at a time, there’s a chance of getting stuck or bored. At a team level, there may be a delay when an item is blocked. While some team members address the blocker, others can work on a second item. The trick is to avoid starting more and more items as additional blockers appear. That’s a slipperly slope.

With more than two things in flight, context-switching overhead starts to become a factor. With that in mind, it may be good advice to limit yourself to a single work-related task at a time, leaving a second “slot” available in your mind to deal with administrative activities that may be part of your job, but not part of “the work” as such. Note that even given a natural ability to juggle two things, as a practical matter you only have capacity to handle one work-related thing at a time with due focus and effectiveness.

Anyone who worked in the software field in the 1980s-1990s is all too familiar with the impact of context-switching overhead. The conventional wisdom of the day was to organize people in a “matrix” and allocate them to multiple projects simultaneously on a percentage basis. An individual might be assigned 50% to Project A, 20% to Project B, 10% to Project C, 15% to Project D, and 5% to Project E.

Without external controls forcing them to function differently, most people tend to fall into the two-things-at-a-time pattern by default. After all, it isn’t realistic to think you can perform software development effectively by spending 5% of your time on it. The nature of the work requires uninterrupted focus for a longer period of time; maybe 2 to 4 hours at a stretch.

Five percent of a 480-minute work day amounts to 24 minutes. Deduct time for meetings, toilet breaks, and administrative activities (say, 10 minutes), and poor old Project E isn’t getting much love; especially when around 15 of those minutes are spent recovering from a context-switch. That leaves negative one minute per day of focused work on Project E. How good is that solution likely to be?

Well, no one works negative one minute a day on anything. What does that even mean? What happened, of course, was that people would charge five percent of their time to Project E because that’s what the time management system allowed them to enter, while in reality they were focusing on Project A most of the time and Project B some of the time, and that’s all. As far as management could tell, everyone was juggling balls like star circus performers. But that perception resulted from artificial metrics.

Balancing capacity and demand

One concept we’ve borrowed from the manufacturing industry is the idea of balancing capacity and demand. For a software team, “capacity” refers to the amount of work they can usually complete in a given time interval. Teams that use a time-boxed iterative process model such as Scrum or Extreme Programming measure capacity with their Sprint length or iteration length as the time interval. Teams using other sorts of process models use some number of days or weeks as the time interval.

In any case, if we count the number of work items the team completes in each time interval, after three or four such intervals we can see what their delivery capacity really is. Sure, there are variations. For one thing, there’s no consistent unit of meaure. Even when teams try very hard to level out the size of their work items, some tasks will take longer than others. Some teams use a sizing scheme of some sort, like “story points” or “ideal hours” to try and arrive at a consistent unit of measure for capacity, to minimize such variation. And depending on the size and complexity of the organization, many teams may not be able to deliver a solution increment all the way to the end customer frequently; they may have an adjusted “definition of done” that fits their circumstances. Over time, all those factors will change as people gradually improve their processes.

Exactly how each team does these things is an “implementation detail” for our immediate purpose here; it doesn’t matter. The point is we want to look at a team’s mean capacity, and not worry too much about short-term variations from task to task. WIP limits are a tool we can use to try and maximize flow, within whatever constraints apply in a given situation.

When we adjust WIP limits and observe flow, we’re applying the Theory of Constraints. According to that model, in any multi-step process one of the steps has the least capacity. That step is called the “constraint.” The system as a whole can operate no faster than the capacity of the constraint. When we have other steps in the process operating faster than the constraint, we accumulate unfinished inventory. That’s defined as a form of “waste” in Lean Thinking.

In software development processes, unfinished inventory can take the form of stale requirements, untested code, tested but undeployed code, deployed but unreleased code, and so forth. Software development work is inherently less predictable than a manufacturing line. One item might require more time in analysis; another more testing; another may be difficult to design and code. Each step in the process will speed up and slow down. In software work the picture is complicated further by the fact all these activities are interleaved; there’s no linear sequence of steps like analysis -> code -> test -> deploy. All those things happen together in small chunks.

Yet, there may be a multi-step process between teams in a software organization. For instance, some organizations separate work on different technical platforms or at different architectural layers, such as mobile apps, web apps, mid-tier apps, APIs, and back-end apps, as well as separating application development teams from infrastructure engineering teams, database management teams, and so forth. Any given initiative may require the services of multiple teams.

To keep the work flowing smoothly, we may need to allow some level of inventory to exist to provide a buffer for variations in the speed of each step in the process. In Lean Thinking this is called “buffer management.” The reason we accept inventory to provide buffers is that flow is more important than minimizing waste. If you think it through, you’ll see the effect of having empty buffers (also known as “starved queues”) or overloaded buffers. Getting the balance right requires observation and mindful adjustment of WIP limits.

When we adjust WIP limits to ensure our system doesn’t try to operate faster than its constraint, we’re applying the first three of the Five Focusing Steps. That’s the basic process improvement mechanism defined in the Theory of Constraints. The steps are:

Identify the constraint
Exploit the constraint
Subordinate everything else to the constraint
Elevate the constraint
Avoid inertia – go back to step 1

The first three steps are for figuring out where the constraint is and then maximizing the effectiveness (that is, the throughput) of the system as it currently stands, without trying to “improve” it. Step 4 is where we might try to improve the system by improving the constraint. At that point, some other step becomes the constraint, and we can’t guess which one it will be in advance, so we have to go back to step 1. The idea is to continue doing this indefinitely.

For software development teams and other teams working on software-related activities, “exploit the constraint” means that if your team is the constraint, you’re going to be at 100% utilization. You’re going to be busy all the time. “Subordinate everything else to the constraint” means every other team will have idle time. The purpose of the idle time is to enable continuous flow. When a team’s services are needed, they will be available immediately because they won’t be “keeping busy” by working on some arbitrary low-priority task.

Sharpening the ax

Abraham Lincoln famously said, “Give me six hours to chop down a tree and I will spend the first four sharpening the ax.”

Historically, one of the most damaging misconceptions about software development work was the old-school notion that once people learn how to program, the rest of their careers consisted of repeating the same work again and again. In fact, it’s a field that demands continuous learning and improvement. The alternative is not stability or predictable performance; it’s steady deterioration of skills, and following from that, deterioration of software quality and value provided to customers.

So, should we say, “Give me six effective working hours per day to build software and I will spend the first four doing code katas?”

Well, not quite. The general idea is sound, but we need to consider the type of work we’re doing. We do need to dedicate time for ax-sharpening, but we don’t need the same proportion of time as Abe needed.

There’s little question these days about the importance of providing time for technical teams to keep up with the field. There are questions about just how to do it, however.

Two flavors of slack

You’ll read and hear people from the Lean school of thought referring to the idle time in non-constraint steps as “slack.” They mean we’re ensuring non-constraint resources will be available without delay at the moment they’re needed.

Software developers have been talking about another concept for many years, and using the word “slack” to describe it. They’re talking about Abe Lincoln ax-sharpening time. That time is not meant to be optional, and is not meant to be used to absorb unplanned work that comes to the team.

Recently, I’ve noticed people saying and writing that the idle time that results from limiting WIP is, in fact, the time we need for ax-sharpening. They say if something urgent comes up, we can just stop practicing or learning or whatever we were doing in our “spare time” and address the urgent item immediately. This is a bit of a misconception.

Expedited work

All of the Lean-style slack is meant to promote continuous flow of planned work. Planned work may include items that require different kinds of activity or different “rules” for pull; we usually refer to these as “classes of service.”

Most Lean-based processes will have a “standard” class of service that represents the type of work they do most often. It’s usually named “standard work,” too, although no particular name is required. In software development work, some of the planned work items have target delivery dates. We might classify those as “date-driven.” Some teams have more categories or classes of work than these. It depends on the work they do and whether some of that work has to be handled differently from other work.

But there’s also unplanned work. Unplanned work for a software team includes production support issues and bug fixes as well as urgent feature requests that come up in the natural course of iteratively testing the market with frequent releases, and (possibly arbitrary) requests that are “urgent” because of the formal power of the person requesting them. Typically, teams using Lean methods manage these things as the “expedite” class of service. Some teams even have more than one expedite-type class of service, as they receive different categories of urgent work from time to time.

Visibility and process improvement

Recently I’ve seen the suggestion that the idle time created when we limit WIP should be used to absorb expedite items so that the team can deliver them without impacting flow. This is touted as a Good Thing. I don’t think this is a Good Thing.

For one thing, it destroys the team’s Abe Lincoln time. My observation in the field has been that once this begins, it balloons into the “normal” way of operating. Teams lose their ax-sharpening time altogether, and every team is soon back to 100% utilization with no real controls on WIP. The expedite class of service starts to be used as a substitute for proper priotization and scheduling of work. Before long, the organization is back in the 1980s again, and delivering accordingly. At the same time, technical staff have no practical way to keep their skills up to date. At that point, the whole “lean” thing is broken.

When a process has too many expedite items, it’s a signal that there’s a problem. It’s likely a “common cause” problem; a fault in the system itself. We want to see that signal as early as possible so we can take appropriate action. If teams are using their idle time as a buffer to absorb expedite requests, then they’re concealing information about a process problem from everyone else in the organization. In effect, they’re nurturing and institutionalizing the problem of too many expedites, rather than solving it.

Explicitly define slack time

It’s easy to avoid this situation if we keep in mind the differences between the two flavors of slack. WIP limits will create idle time on non-constraint teams, which we can call “slack” if we like that word. That’s to promote continuous flow of planned work.

In general, we want to minimize expedited work. Otherwise, we’ll lose control of the process and it will quickly deteriorate. To do that, we need to exploit a basic characteristic of lean organizations: Visual management.

Let the system reveal its own weaknesses. Don’t cover them up by absorbing unplanned work into teams’ Abe Lincoln time. If we’re getting too many expedite items, let’s find out why and correct the problem.

Given an 8-hour work day, some of that time is not really available for heads-down work on planned tasks. Some portion of it is used for resting, eating, coffee breaks, stretching, meetings, administrative tasks, and other things that are necessarily part of a work day. Most people plan for that, and they set expectations that technical teams will be able to dedicate, say, 5 and a half or 6 hours a day to heads-down work.

The way to set aside Abe Lincoln time and keep it from being “available” for expedite requests is to include it among the things that are subtracted from the original 8 hours per day. It’s only the remaining 5 and a half or 6 hours that are available for planned work anyway, regardless of expedited work that may come up. If we treat Abe Lincoln time as a first-class citizen of the organization, then we’re down to, maybe 5 hours per day for planned work. We can collect those 30-minute segments and put them together so that we have a practical chunk of time for ax-sharpening. That’s realistic, practical, and sustainable, and it won’t break the Lean system or burn out the workers.

Continuous improvement

It’s natural for people to worry about the impact of reducing the amount of “work time” for teams. After all, before they started their Lean and/or Agile “transformation” initiative, it was all they could do to keep their corporate heads above water with a large staff working overtime every week. How can they possibly support the business if people are spending fewer hours on the treadmill?

The good news is that the collective effects of numerous changes in the organization that occur during an improvement program will compensate for the time. With traditional methods, people had to spend a lot of time trying to get things done because of organizational constraints that made everything difficult. As we learn to separate high-value work from busy work, start to visualize our process, learn to prioritize and schedule work appropriately, reduce batch sizes, eliminate cross-team dependencies, introduce robust technical practices, and automate repetitive tasks, we find that the same-sized staff can handle all the work they did before, and then some, while assuring high quality and good alignment with customer needs.

Abe Lincoln time isn’t something “extra” that is “given” to technical people as a “perk.” It’s a business necessity to keep the enterprise competitive in the market. The cost of Abe Lincoln time is negligible compared to the cost of not enabling technical staff to keep up with advancements in the field.