Posted on

Reducing process overhead in Scrum

In a social media discussion in mid-2019, several people expressed surprise at the idea that Scrum might include “overhead.” The confusion seemed genuine. Some people asked for examples. It seemed they were unable to conceive of “overhead” in Scrum.

The popularity of Scrum has led to an interesting situation in the Agile community. Many people view Scrum as The Answer. It’s the only and best way. There is no possibility to improve beyond Scrum. Everything in Scrum is valuable by definition.

In reality, every process includes overhead. We do things for customers/users, and we also do things to position ourselves to do things for customers/users. When we don’t distinguish between the two, we can fall into the trap of trying to perfect our overhead activities, rather than minimize them.

Customer-defined value per Lean Thinking

It might be useful to review a concept from Lean Thinking. The first priority in Lean is to focus on customer-defined value. Okay, that sounds good. What does it mean? Basically, it refers to anything the customer wants to pay for.

Let’s consider an example from the manufacturing world, as that’s where Lean Thinking originated. Our company manufactures screws. Our customers want screws that

  • are consistent
  • fit into receptacles that have been machined to the same specification as the screws
  • are strong enough for the intended application
  • won’t expand and contract excessively
  • are durable under the intended operating conditions
  • can be picked up by a magnet.

To produce screws that meet those requirements, we invest in certain manufacturing equipment, we establish certain procedures to assure quality, we provide relevant training to our employees, we test the raw materials to ensure they are of adequate quality, and we monitor our manufacturing operation.

Our customers just want the screws. They don’t care how we make them. So, they aren’t interested in paying for our machines, procedures, training, testing, or monitoring. All those things are “overhead costs.” They are things we do to position ourselves to make screws for our customers.

We can increase margins by reducing the cost of precision machining, assuring quality, training employees, testing raw materials, and monitoring operations. If a new machine is invented that produces screws more consistently and efficiently than the old one, we might retire the old machine and install the new one. If we discover or invent a better way to assure quality, we’ll implement it. If we find more-effective ways to train employees, we’ll use them. If we learn a more cost-effective way to check the quality of raw materials, we’ll use it. If we can monitor operations just as effectively but at lower cost, we’ll adapt accordingly.

In other words, if we can reduce overhead without compromising quality or production, then we will certainly wish to do so. Our customers are not interested in subsidizing inefficient and costly ways of producing screws. They only want to pay for the end result.

All the other costs are on us. We can’t compensate for inefficient operations by raising prices, because we’re not the only company in the world that makes screws. We have to compete.

You might point out that the customers don’t really want screws. They want to assemble some sort of structure or machine, and our screws help them do that. But we can consider the screws to be the valuable product in the context of our manufacturing operation. They are the output of the process that customers are willing to pay for.

How this applies to software development

Now let’s pretend we’re a software development team, or organization with multiple teams. Our customers want to do something that is supported, at least in part, by software that we produce and deliver. Maybe customers want to stay fit, so they keep track of their exercise and they use our software to help them do that. Maybe customers want to handle their banking transactions using their smart phones, and our software helps them do that. Maybe customers want to find their way to a location in the city, and our software helps them do that.

Just as the screws in the previous example are not really what the customer wants, in this example our customers don’t really want the code that we write. They just want to stay fit, handle their banking, or find their way around the city. But the software is the output of our work, so we can consider it the valuable product in this case.

Looking at it in this way enables us to distinguish between the things we do to produce the software, and the things we do to position ourselves to produce the software. The former are value-add activities, in Lean terms, and the latter are non-value-add activities, or overhead.

What does it mean to produce valuable software? Usually, people in the software field talk about two things:

  • build the right thing
  • build the thing right

Build the right thing

Our customers are willing to pay us for the right thing. They don’t care how much it costs us, internally, to figure out what the right thing is. That’s overhead activity. So, we want to figure out what the right thing is in the most effective way we can.

This post focuses on Scrum, so let’s narrow the context a bit. The basic model assumes there is a cross-functional team containing all the skills necessary to complete the work, and the team is fully dedicated to a single product or a single project at a time. For software development, the team is part of a larger organization and someone else in that organization knows what “the right thing” is. From the team’s perspective, knowledge of “the right thing” resides in the Product Owner, a special role defined in the Scrum framework. The team knows how to build a product, but isn’t equipped to deal with all the business-related issues connected with customers, marketing, funding, legalities, regulatory compliance, and so forth. That isn’t their mandate. Their mandate is to build the product.

Figuring out what the right thing is can be pretty challenging. It may involve market research, focus groups, A/B testing, and more. Software development teams are not set up to perform these activities, and they don’t have the expertise to do so.

Traditionally, software teams tried to find out what the right thing was by performing extensive analysis before beginning development. This proved to be very time-consuming and still resulted in significant variance between what the users needed and what the delivered software product actually did. All that analysis time, as well as all the after-the-fact time to correct errors and fill in the gaps in product functionality, was overhead. That sort of overhead with traditional methods is very costly.

Scrum calls for close collaboration between the team and the Product Owner, who is responsible for the business value of the product. This sometimes works well and sometimes not so well, but in any case it’s the basic Scrum approach. Generally, this results in teams having a much clearer understanding of user needs than with traditional methods, as well as shortening the lead time to get a working product into users’ hands. So, it’s a big reduction in overhead.

But it’s still overhead. Less overhead than before, maybe, but still overhead. Our internal collaboration during development is not what the customer wants to pay for. It’s not a “product.” It’s a way to position ourselves to build the right product.

Build the thing right

The team is expected to know how to build the thing right. An aspect of Scrum is that the team is trusted to determine how best to carry out the work of building the product. So, building the right thing is “handled,” and the team must focus on building the thing right.

Scrum literature describes such a team as self-organizing. The team members determine how to build the thing right. They are not micro-managed by anyone outside the team.

The canonical Scrum events, which used to be called “ceremonies,” are overhead. They are activities designed to help position us to build the product. This is where some in the Agile community go astray; they consider the Scrum events to be sacrosanct and permanent.

In fact, they are ways to achieve specific goals. As teams learn to achieve those goals with less overhead, they can improve their process. But if they believe Scrum is The Answer, in final form, then they will never even think about improvement. They will only think about “doing Scrum.”

Why feedback loops?

One of the foundations of Scrum is an emphasis on feedback loops. They are built into the framework at multiple levels – releases, sprints, days. When we incorporate software engineering practices with Scrum, we introduce still more feedback loops – Three Amigos sessions, test-driven development, time-boxed pair programming sessions, etc.

Frequent checking of work and communication of intent across the whole team and with stakeholders help us keep the work aligned with user needs, and help us ensure high quality at all times, so that we won’t waste time later with bug-fixing.

Why time-boxes?

All overhead activities in Scrum are time-boxed. A time-box isn’t merely a time interval, it’s a box of time. A closed box.

The time-box mitigates schedule slippage, simplifies planning and tracking, and drives process improvement.

With traditional methods, when a task took longer than expected people simply extended the schedule. This contributed to delayed delivery of solutions. It also took time away from the next task, and the next, until the project fell down like a line of dominoes. Time-boxes help prevent schedule slippage, and lead us toward adjusting scope instead.

The time-box simplifies planning and tracking because it’s straightforward to count the things that happen within a time-box, and to forecast future performance by looking at how many things happened in each of several previous time-boxes of the same length.

The time-box drives improvement because people feel uncomfortable when they don’t complete a task within the time-box, and they are “forced” to stop working on the task. They can’t simply keep working on the task, because the time-box has expired. So, people have to experience the consequences of not finishing the task. The idea is the discomfort will inspire people to think about how they can complete the task next time without exceeding the time-box.

The fact the inventors of Scrum came up with the idea of time-boxes demonstrates that they understood these activities are overhead and not value-add activities. They didn’t want teams burning up all their time with overhead activities.

Note the only activity that is not time-boxed in Scrum is direct work on the product itself. That’s because we want the team to spend as much time as possible on value-add work.

Therefore, a self-improvement goal for Scrum teams is to shorten the time-boxes for the canonical Scrum events, while still achieving the goals of those events. Doing so will release more of the available time for the team to spend on value-add work.

Scrum supports iterative development. Iterations in Scrum are called sprints. A sprint must be long enough to allow the team to produce something meaningful, and short enough to serve the purposes listed above. When teams first learn Scrum, they may begin with a sprint length of 3 or 4 weeks. As they gradually improve their processes and practices, they can often reduce their sprint length to 2 weeks, and eventually to 1 week. Some teams learn to operate smoothly enough that they drop explicit sprints altogether, without losing any delivery effectiveness.

As the team progresses, the canonical Scrum events associated with sprint planning, sprint review, backlog refinement, and so forth also become shorter.

Other time-boxed overhead activities can usually be streamlined as the team gains experience, too.

Beware of adding overhead

Some novice Scrum teams introduce an additional event or ceremony to the model: The mid-sprint review. This happens because the team has so much work in progress that they can’t keep track of it very well, even within the scope of a single sprint. The mid-sprint review is a workaround for this problem, and not a solution. Yet, many in the Agile community see the mid-sprint review as an “improvement” to Scrum, and they institutionalize it as a permanent part of their organization’s software development process.

In fact, it adds another overhead activity to the process. It would be better to learn how to operate with shorter sprints, smaller backlog items, closer collaboration, and shorter feedback loops; that is, seek corrective action that reduces overhead rather than increases overhead.

People tend to introduce workarounds to long-term or recurring problems as a way to mitigate their discomfort. But discomfort is the driver for process improvement. Embrace it. Resolve the discomfort by eliminating its root cause, not by pasting a bandage over the top of the problem.

Daily Scrum

Novice Scrum teams may regard the daily scrum as a value-add activity. It isn’t. The purpose is to ensure everyone on the team understands what’s happening with the work, and whether anyone needs help. That’s an important goal, but the event or ceremony to achieve the goal isn’t a value-add activity in itself. It’s a way of positioning the team to perform value-add work effectively.

The traditional time-box for the daily scrum, or daily stand-up, is 15 minutes. The reason is to prevent the scrum from turning into a traditional meeting. That was a common problem at the time Scrum was invented. Some novice teams struggle to fill the 15 minutes, as they don’t have that much to say. That’s actually a good thing, and the teams could shorten the time-box for the daily scrum.

The original guidance for the daily scrum called for three questions: What did I do yesterday? What will I do today? Do I have any impediments? This is not a “rule.” Remember that Scrum was invented in an era when software developers rarely spoke to each other. Now they were asked to look team mates in the eye and say what they were working on. It was psychologically difficult. The three questions were devised to help people think of something useful to say. This isn’t a problem with software developers anymore.

When a team starts to use highly collaborative methods, the daily scrum can become shorter, as there aren’t as many backlog items in flight concurrently. For example, when a team first starts to use pair programming it’s not unusual to hear something like this in the daily scrum:

  • Claudia: “Yesterday Nicola and I paired on Story 452. We’ll finish it today. No impediments.”
  • Nicola: “Ditto Claudia.”
  • Samuel: “Yesterday Rajesh and I finished Story 449. I’m available for a new pair today.”
  • Rajesh: “Ditto Samuel.”

There’s a low signal-to-noise ratio in that daily scrum. When team members routinely pair on most tasks, there’s approximately half as much to say at the daily scrum. Yet, many people feel as if it’s mandatory for everyone to speak, even if they have nothing to contribute.

We can streamline it further by focusing on the backlog items instead of the individuals. To do this, we walk the task board backward. Start with the item that’s closest to being finished. Let the team discuss how they can pull that item forward and complete it. Then move to the next item that’s close to completion. There’s no need to cover the whole board, as we’re interested in the work that’s happening today. The team’s focus should be on finishing things rather than starting them, so we can focus on the items that are nearing completion. Team members who are working on other items need not speak up unless they have an impediment. This saves time and results in a useful outcome.

Another collaborative practice has become mainstream in the past few years: Mob Programming. The whole team works together on one task at a time. When a team uses Mob Programming as their standard way of working, it’s usually feasible to dispense with the daily scrum altogether. Everyone is working together in the same room all day, so everyone is always up to date regarding the current state of the work as well as any impediments blocking the team.

So, you may be able to reduce overhead time, and possibly eliminate an overhead activity entirely, by adopting more-effective work practices.

Hand-offs and cross-checking work

To be sure we build the thing right, we frequently check work in progress to detect any quality issues as early as possible. This type of activity takes many forms. It includes discussions during hand-offs between specialists on the team; verifying that the code does what was intended; reviewing code to ensure it follows architectural standards, coding standards, error-handling standards, etc.; double-checking application packaging and configuration prior to deployment; and much more.

All these activities are overhead. Collaboration, learning, and automation can contribute to reducing the overhead necessary to achieve the same goals.

Pairing across specialties can improve the quality of communication and hand-offs between specialists, and reduce the time necessary to accomplish a clean hand-off. Over time, individuals can learn enough about one another’s specialties that they become generalists. At that point, hand-offs can be eliminated altogether. That removes an overhead activity, as well as reducing wait times when work needs to be addressed by someone with a particular skill set.

Pairing within specialties helps keep everyone working in accordance with agreed standards and guidelines and avoids many small human errors that could otherwise result in costly debugging efforts and back-flows in the process. This sort of pairing is especially useful within the disciplines of analysis, testing, API design, UX design, database design, and programming.

Automation can play a big role in reducing the time spent in overhead activities. Routine, predictable testing and regression testing can almost always be automated. Test-driven development helps avoid introducing defects into the code, helps align the product with user needs, keeps the design simple and maintainable, and facilitates introducing new team members to the code base.

An automated build process provides quick feedback about recent code changes and avoids manual errors in building, packaging, and deploying the product into a test environment. The “infrastructure as code” approach avoids manual errors in configuring and provisioning environments.

Backlog Refinement

It’s important to understand what the team should build, in what order they should build it, how they will know when they’ve done enough, whether a backlog item looks as if it will require significantly more time than average, and whether a backlog item has any significant architectural implications or introduces a risk. Backlog refinement helps teams achieve these goals, but refinement itself is not a “product” the customer wants to pay for; it’s overhead.

Novice Scrum teams often get bogged down in backlog refinement. There is a tendency for people to try and document all the same things as they used to do with traditional methods. Also, people want to discuss every issue that comes up in detail, and try to solve it immediately.

It takes time, mindful practice, and possibly some guidance from a coach, to learn to refine a Scrum backlog in a way that provides all the necessary information to all interested parties at the appropriate level of detail. As a team progresses with this learning, they will need to spend progressively less time in backlog refinement activities.

Technical Practices

Scrum doesn’t specify any technical practices. There’s a reason for that: Scrum isn’t limited to any single type of product development. As Ken Schwaber put it, Scrum is “a management wrapper for empirical process control.” It can be used for any type of product development, not just software. A well-known non-software example was Wikispeed, a Seattle-based startup that used Scrum to design, develop, and produce cars.

The particular things you would do if you were developing a new line of handbags and shoes will be quite different than if you were developing a new manned spacecraft for missions to the International Space Station. Those details are out of scope for Scrum. However, the expectation is that we will include the appropriate practices for the type of product we’re developing. The lack of specific technical practices in Scrum does not imply we have carte blanche to ignore well-known good practices for our particular industry.

Even within the general sphere of software development there are great variations. If we’re writing embedded software for a dangerous device, such as a medical device that exposes patients to radiation, or a grain harvester that cleans its own exhaust manifold by spraying it with diesel fuel and burning it off, we’ll use different technical practices than if we’re writing a simple game for smart phones that displays advertisements to users.

Narrowing the scope further, if we consider just general business application software development, which is the world where Scrum is used the most, we find variations. Developing application code by hand is different than integrating and configuring third-party packages. Developing algorithmic code is different than writing a set of interfaces between services. Writing code in Java for Windows is different than maintaining existing COBOL code for zOS.

So, it’s not practical to predefine specific technical practices for Scrum. Yet, within our particular context, the use of appropriate technical practices can reduce overhead activities with Scrum.

Learning and applying generally-accepted software design principles keeps the code base consistent, understandable, and maintainable. Pair programming reduces the need for formal code reviews by providing continuous code review in the normal course of work. Static code analysis can enforce coding standards without the need to spend precious time in code reviews. Test-driven development reduces the need for testers to check functionality manually, freeing their time to perform more valuable testing activities. It also facilitates changing pairing partners, as it’s much easier to shift into a user story in progress when there are unit tests around the code. Continuous integration, frequent small commits, and trunk-based development reduce overhead time spent in resolving merge conflicts and correcting defects at the integration level. The list goes on.

Spikes

Spikes, or architectural experiments, may be seen as value-add or not. They seem to add value in that they provide learning to the team about how the solution should emerge. Yet, they don’t directly result in shippable solution increments. Customers aren’t interested in paying for spikes; they’re an internal matter.

So, it’s important to time-box spikes. Some Scrum teams treat spikes the same as user stories. They estimate or size the spikes and prioritize them against other backlog items.

But the purpose of a spike is not to produce a solution increment. The purpose is to explore an open question. The result might be “this won’t work,” “this might work,” “we still need more information about this,” or “this is turning into a big deal; let’s stop now and cut our losses.”

It’s best to time-box spikes rather than estimating or sizing them. If you weren’t able to come to a conclusion within the time-box, it means the spike wasn’t defined clearly enough or the scope of the work proved to be greater than expected. You might decide you need another spike to continue the investigation, or you might decide it isn’t worth investing any more time and effort in the question.

Pairing Sessions

So far, I’ve closely associated time-boxing with overhead activities. Many teams time-box the working sessions when team members pair together. But, when people are pairing together on backlog items, aren’t they performing value-add work?

Yes, that’s value-add work. The pairs are building the next solution increment. That’s directly in line with what customers are paying for. So, why the time-boxing?

This case is a bit subtle. The work the pairs are performing is not overhead, but some of the issues that pair programming mitigates are overhead, and it’s often helpful to structure the pairing sessions in a way that maximizes the benefit.

When people are new to pair programming, they often try to “adopt” it without adopting it. They want to be seen (by management) as pairing, but they don’t (yet) fully appreciate the value, and they don’t (yet) really understand that there are specific techniques involved, beyond just sitting next to each other.

So, two people who know each other may “pair” together forever. Two people may take care to sit together so it will appear as of they’re pairing, but in fact they take different tasks from a user story and work on them separately. There are countless ways that people think of to look as if they’re pairing without actually pairing.

When that happens, the team is not gaining most of the key benefits of pairing: Continuous code review, knowledge transfer, broadening skills, bringing new team members up to speed, mentoring junior colleagues, catching minor errors before they become defects, reminding one another about coding standards and so forth.

To make sure these things happen, we might time-box pairing sessions and require partners to change at the end of each time-box. It’s okay if a different pair finishes a user story than the pair who started it.

Unlike the daily scrum and backlog refinement, we don’t want to reduce the time we spend pairing (unless we’re replacing it with mob programming, of course). But we do want to reduce the time we spend in code reviews, chasing down trivial bugs, learning new skills, mentoring junior colleagues, and helping new team mates learn the system.

There’s another issue with pairing that can be mitigated by time-boxing: It’s tiring. When we work alone, our attention waxes and wanes throughout the day. We focus for a few minutes, relax for a few minutes, focus again, relax again. In that way, we can work steadily throughout the day.

When we pair, we tend not to relax. We are focused all the time. I don’t know why this happens, but I’ve experienced it and many colleagues have, as well. We can lose track of time while pairing, and become exhausted before the end of the day. (Incidentally, mob programming does not have this characteristic.)

It’s helpful to break up the pairing time with intervals when we do other things. Some tasks lend themselves better to individual work anyway, and there are also administrative tasks that we all have to take care of during the day, and emails to answer, and all that.

Sizing

Many software development teams estimate or size their backlog items. They may provide a high-level estimate or size for large-scale, broadly-defined backlog items several weeks or months in advance, to support long-term product planning. As backlog items near their start date, the teams will refine the items to arrive at user stories, and estimate or size each story.

Novice teams usually spend more time on this than mature teams. Initially, the team may not be very skilled at defining backlog items. Some items will be very large, others very small, and many of them may not be entirely clear. The team has to spend some time discussing all this and figuring out what the items really mean. Often they will break up user stories into tasks, to try and gain a better understanding of what must be done to complete the story. At this stage, the team may not be skilled with collaborative working methods, and they use the tasks as a way to assign specific chunks of work to individual team members.

This sort of churn usually diminishes with time and experience. Teams learn to define backlog items clearly, to craft user stories that are similarly sized, and to work together collaboratively. Those improvements generally result in less overhead time for estimation or sizing.

A novice team may estimate stories in terms of absolute time. A first step toward improvement may be to estimate in terms of ideal time, using a load factor to account for non-productive time during the day. Further improvement may be to use points to size the stories relative to each other, but formulaically pegging a “point” to some amount of time, such as 4 hours. Eventually the team may learn to forget about hours and use the relative points without reference to time. Later still, they may learn to define their work in same-sized pieces, such that they only have to count the items rather than sizing them. Each incremental improvement in estimation practices reduces the amount of overhead necessary to prepare backlog items for work.

Retrospectives

Retrospectives are a standard Scrum event in which team members think about how the work has gone and how their interaction as a team has gone in the near-term past, and how they might improve something about their process or the quality of their working lives.

Clearly, this activity does not contribute directly to the development of the product. The activity is time-boxed so that teams can control the amount of time they have available for value-add work.

On the other hand, the retrospective is a critically important part of Scrum. It’s the principal way in which Scrum teams avoid slipping into a permanent stupor, mindlessly churning out one user story after another.

With that in mind, we don’t want to reduce the amount of time teams spend in retrospectives, as we want to do with other overhead activities. We need to use the time-box in this case to preserve the activity and keep it meaningful.

Conclusion

Scrum is not the perfect, final solution for effective software delivery. It was developed to address certain organizational issues, and it does a good job of that. But be careful about falling in love with specific details in the Scrum Guide. Think of it more as a starting point for continual improvement than as a permanent solution.

Every process has overhead. Overhead has a purpose. The purpose is to help position the team to perform value-add work effectively. But the overhead activities themselves are not value-add activities, as they don’t contribute directly to product development and customers are not interested in paying for them. Therefore, we want to minimize or eliminate overhead activities, when we can do so without compromising quality or delivery effectiveness.

When you are considering changing a process, think about whether the proposed change will reduce overhead and improve effectiveness, or it will increase overhead without improving effectiveness.

For example, using Mob Programming in conjunction with Scrum reduces overhead for the daily scrum, knowledge sharing, skill building, new team member learning, mentoring junior colleagues, avoiding minor mistakes before they become serious issues, formal code reviews, and dealing with blockers and impediments. On the other hand, adding a mid-sprint review event to the Scrum process adds overhead and takes time away from the team they could be spending on value-add work, while leaving the underlying problem(s) unsolved.