Surely everyone involved in developing software has heard of Brooks’s Law. First presented in the eponymous chapter of Frederick P. Brooks, Jr.’s classic The Mythical Man Month, it states: “Adding manpower to a late software project makes it later.” This “law” is much beloved by software developers as a handy bucket of cold water with which to cool the ardor of overly enthusiastic managers and executives. Lately, however, I’ve been thinking about Brooks’s Law and rereading The Mythical Man Month and I’m no longer as impressed with Brooks’s analysis as I once was. This is the second in what I expect will be a series of posts discussing some of the reasons why. The first post in the series discussed training costs.
As part of his “demythologizing of the man-month” (p. 25) Brooks points out that developing software is subject to what he calls “sequential constraints”. Brooks actually makes two points about sequential constraints, but he doesn’t draw a particularly clear distinction between them in his exposition, so I’ll start by teasing them apart. They are:
- All tasks, including software development, have some sequential constraints on their subtasks that determine the minimum time in which the whole task can be completed.
- Even if the rest of the work can be parallelized, communication needed to coordinate work can act as a sequential constraint, putting a lower bound on the time needed to complete a task.
Point one is a simple matter of logic. Virtually every task, from harvesting a field of crops, to having a baby, has some sequential constraints on some of its subtasks that determine that certain subtasks can only be done after other subtasks are complete. It’s impossible to complete the whole task in less time than the time it takes to do the longest sequential chain. By definition, subtasks that are not sequentially constrained can be done in parallel and so more people working on them at the same time will get them done sooner than fewer people.
Tasks vary in the both the nature and extent of the sequential constraints that apply to their subtasks. Brooks gives harvesting crops as an example of a task with very few sequential constraints and bearing a child as one that nothing but a long sequentially constrained chain. It’s worth noting, however, that all real-world tasks are sequentially constrained at some level — even harvesting a field of crops, for instance, requires that someone get to the farthest corner of the field, harvest what’s there, and bring it back. No matter how many field hands you hire and no matter how minuscule the part of the field each is responsible for there’s still no way to harvest the whole crop faster than that long leg could be completed.
For practical purposes, however, Brooks is right: harvesting crops is almost entirely parallelizable and bearing a child is almost entirely not. Before we get to how communication itself can act as a sequential constraint, let’s consider another task which is more sequentially constrained than harvesting crops but less so than having a baby, namely baking a cake. Consider for instance this simple cake recipe:
- Pre-heat the oven to 350°.
- Prepare the cake pans — greasing, lining, and flouring.
- Sift together flour, baking soda, and salt.
- Cream the butter, shortening, and sugar until light and fluffy.
- Add dry ingredients to butter/shortening/sugar mix.
- Mix in three eggs.
- Pour batter into cake pans.
- Bake for 25 to 30 minutes.
- Cool on racks for 10 minutes.
- Remove from pans and continue cooling.
As with most recipes, there are both opportunities for parallelism and unavoidable sequential constraints. If you had a three cooks in the kitchen one of them could prepare the cake pans while another sifts together the dry ingredients and a third creams the butter, shortening, and sugar. After that, the next three steps, up to pouring the batter into the cake pans, while sequentially constrained relative to each other, could be done in parallel with the oven heating. Thereafter, everything is sequentially constrained. No matter how many cooks you have, you have to heat the oven before you bake the cake and bake the cake before it can cool. Thus there’s no way to decrease the total time it takes to make a cake below about 45 minutes.
Now let’s consider how software development is sequentially constrained. As anyone who has written software knows, there are sequential constraints but where do they come from? There are certainly no physical constraints such as the one that keeps a baker from pouring batter into cake pans before the batter has been made. If, somehow, we knew at the beginning of a development project all the lines of code that needed to be written, we could type them in any order we wanted — the software would work just as well in the end. But the notion that we could know in advance all the code that needs to be written and type it like we were taking dictation is just crazy. Programming isn’t primarily a typing problem, it’s a thinking problem. And thoughts need to be thought in the proper order.
In fact, the only way to figure out how a software system ultimately fits together is to build it. In order to know, in detail, how part X is going to work we need to know how part Y, with which it interacts, is going to work. And the only way to know how Y is going to work is to build it. It may be that we can completely build X and then build Y or we may need to alternate — build a bit of X in order to develop enough information to build a bit of Y from which we learn enough to build another bit of X, and so on. It might also be equally possible to start by building X and then build Y or to start with Y and then build X. But however we do it, the flow of information about the system we are building are the inherent sequential constraints we operate under. Note that this has nothing to do with communication — even if the system were being built by a single developer these constraints would still constrain the order in which various parts of the system could be built.
Now, keeping in mind these inherent sequential constraints, let’s consider Brooks’s second point, that the need to communicate can itself act as a sequential constraint. This is the software development equivalent of Ahmdal’s Law from parallel computing which says that no matter how much you can parallelize a computation, it’ll never complete faster than the time it takes to combine the results of all the parallel computations. In both software development and parallel computing this is because communication is inherently sequential. If ten people — or ten CPUs — each have six minutes worth of information to convey to each other, it’s going to take at least an hour of elapsed time no matter how you slice it; ultimately each person is going to have to spend six minutes “transmitting” their information and fifty-four minutes “receiving” information from the other nine people.
To see how this effect plays out, imagine we have an idealized software development task whose coding can be partitioned among however many developers we like but for every ten hours a developer is going to spend coding, they need to spend one hour writing down what they’re going to do for the benefit of the rest of the team and everybody has to read everyone else’s notes. In other words, before each ten hours’ worth of coding, a developer spends an hour writing an email about what they are about to implement and sends it to all the other developers. Then they have to read the other developers’ emails, spending an hour to absorb each one. After all that communication, the developers can each code for ten hours. For simplicity, we’ll assume that even a developer working alone would spend the hour writing notes for themself documenting what they plan to do in the next ten hours.
Suppose the total coding time needed to develop the system is 100 person-hours. A single developer could do it in 110 hours, ten chunks of an hour of note writing followed by ten hours of coding. Two developers could do it in five chunks of work with each each chunk consisting of twelve hours of work: an hour writing notes, an hour reading the other developer’s notes, and ten hours coding. Thus for the team of two, the total elapsed time would be 60 hours, of which 10 would have been spent on communication. Five developers could complete the project in only two chunks but each chunk would be fifteen hours: one hour writing, four reading, and ten coding. Thus their elapsed time would be 30 hours with 10 hours spent on communication. Ten developers would be done in 20 hours elapsed time — ten hours of development after ten hours of communication.
Even if we could scale down the communication cost proportionally, so less than ten hours of individual work requires a proportionally smaller amount of email writing and reading, the elapsed communication time still stays at ten hours: twenty developers would spend a half-hour writing their emails and nine and a half hours reading nineteen emails before coding for five hours. Indeed, as the number of developers approaches infinity, the amount of time spent coding approaches zero as does the amount of time each developer spends writing their own email while the amount of time spent reading the infinite number of infinitesimally short emails from other developers approaches ten hours and the project as a whole still takes a minimum of ten hours to complete. Thus even when there are no other sequential constraints — when we assume that an infinite number of developers can each be given an infinitesimally small part of the project to work on in isolation — communication remains the one activity that must be performed sequentially.
In real software projects, of course, things are more complicated. The inherent sequential constraints — those that would affect a single developer working alone — interact with communication induced constraints in all sorts of complicated ways. For one thing, if we assume — as Brooks seems to — that the overall task is partitioned into subtasks, each to be developed by a single developer, then the way we do the partitioning can have dramatic affects on the amount of communication needed. If we split the system at its natural joints, then communication will be minimized — if subsystems are naturally decoupled then developers can work on their bit for a while, developing lots of information about how their part of the system works, which only they need to know, and just a little bit of information that they need to share with other developers. On the other hand, if the partitioning is poor, each developer’s part of the system will depend on many details of other parts and the developers will either need to communicate much more often or, more likely, will all go off in their own directions for a bit too long and then discover, when they compare notes, that they need to backtrack and rework things in order to make everything fit together.1
Another issue, which Brooks doesn’t mention, is that the need to communicate can stall productive work. One of the idealized aspects of the hypothetical project above is that the developers work in perfect lockstep — everyone communicates and then works for exactly ten hours and the cycle repeats. At no point is anyone stalled waiting for someone else. In real projects, some subtasks will be bigger than others leaving developers whose pieces happens to be smaller waiting after they’ve finished their work to communicate with developers whose pieces are larger. Every hour that they spend waiting is an hour that gets added to the total number of person-hours it takes to complete the project.
That all said, there’s nothing that says the only way to divide up a task is by partitioning it into pieces that are each implemented by a single developer. In fact there are all sorts of reasons, which I’ll talk about in a later post, that that might be a bad idea. For now, let’s just note that if we could avoid a strict upfront partitioning, and could let developers share ownership of the system as a whole, working together frequently and sharing ideas about how it all fits together, they could probably much more closely emulate the order of development that we would see if we watched a single developer build the whole system, constrained only by the inherent constraints of needing to build enough of X in order to know enough to build Y and discovering, as they go along, enough bits that can be naturally carved off and done in parallel to keep everybody busy.2
So how does all this relate to Brook’s Law? In the concluding paragraph of the chapter, right after he has stated his Law, Brooks goes on to say:
The number of months of a project depends upon its sequential constraints. The maximum number of men depends upon the number of independent subtasks. From these two quantities one can derive schedules using fewer men and more months. … One cannot, however, get workable schedules using more men and fewer months. (pp. 25-6)
The last sentence is only true if the number of workers currently on the project is sufficient to take advantage of all the opportunities for parallelism. For instance, suppose we have a project consisting of forty individual tasks, each of which will take a weeks’s worth of work by one person. Now suppose ten of those tasks are inherently sequentially constrained while the other thirty tasks can be done at any time, in any order. Because of the ten sequentially constrained tasks, the project can’t be completed in any less than ten calendar weeks. But suppose the project has been assigned to a two-person team. It will take them twenty weeks to do the whole project, ten weeks longer than the minimum. Clearly in this case, we can get “workable schedules using more men and fewer months” by adding one or two people to the team. A team of three would finish in a bit over thirteen weeks and four would finish in the minimum time of ten weeks. To say that we can’t reduce calendar time because of sequential constraints would only be correct if we had originally assigned the project to a four person team.
In general, given that Brooks’s Law is talking about late projects, that is, ones we badly underestimated in the first place, what’s the likelihood that our estimate of how many people we needed was exactly right? The real question, if we’re concerned about sequential constraints, is whether or not there’s work that could be done in parallel. Sometimes there is and sometimes there isn’t and assuming that there never is is just as foolish as assuming that there always is.
1. In other words, the only thing worse than paying the costs of communication is not paying the costs of communication. Because we will pay them eventually, with interest.
2. Obviously if the team pair programs then the partitioning problem is made quite a bit easier as n people need only n/2 tasks to keep everyone busy, rather than n.