Archive for the ‘Programming’ Category

Wherein I attempt to establish my testing cred

October 8, 2009

After my recent posting about what some of the folks I interviewed for Coders at Work had to say about unit testing and TDD, I’ve seen comments various places from people who seem to think that I must never have really tried TDD or that I think unit testing is not useful. Neither of which is really true.

In the last two jobs I had before I quit full-time work to write books and do a bit of consulting I was, among other things, the guy who wrote the test framework used for testing all our Java code. My first framework, which I wrote in the early days of Weblogic, predated JUnit but was similar in intent—my goal was to make it as easy as possible for developers to write fine grained tests and to pinpoint as precisely as possible the cause of test failures so that we could run tests in a continuous build-and-test system that tested every check-in and reported the results. (”Build and test monkeys” we called them at Weblogic and a stuffed monkey made the rounds of the developers’ desks, sitting on the desk of whoever had last broken something.)

In addition to working on the framework itself, I spent a lot of my time at Weblogic trying to figure out how we could have finer-grained tests that would run more quickly and provide easier to track down failures than the rather coarse-grained system tests we had a lot of. (At one point, when I was working on our EJB implementation, I came up with a trick that I’m still a bit proud of: to test our implementation of the state machine implementing the EJB life-cycle, we wrote an EJB that collected a list of tokens for each of the methods called on it by the EJB container and the test, after putting the EJB through its paces, fed that sequence of tokens to a parser generated from the grammar of legal paths through the state machine.)

At my next job, at another start-up founded by one of the Weblogic founders, I was hired early on not only to work on the server part of our product but to be in charge of our software development process. We set up another battery of build-and-test monkeys, using a test framework I had written after leaving Weblogic. And having experienced the benefits of pair programming and lots of testing at Weblogic, I tried to push our process toward something like XP though I don’t think we ever did enough of the practices to really count as an XP shop. Now, with six years of hindsight, I’m still not sure whether I’m glad or sad about that.

In my role as a developer at Kenamea, one of the biggest projects I worked on was implementing a transactional object store. Other than some proof of concept code I wrote by myself, that part of the system was almost entirely pair programmed and was possibly one of the most extensively unit tested parts of our product. I don’t actually recall whether we ever did test-first programming on it but my pair and I tried to be quite strict about writing unit tests for all the code we wrote. And the tests were definitely valuable. They gave us—as unit testing advocates always promise—confidence to dramatically refactor things when necessary and they told us when we had slipped up and broken something. On the other hand, the system was multithreaded which, as Joshua Bloch points out in Coders, always makes things much harder to test. Often the tests were sufficient to show the presence of a bug without pinpointing its location; hours or days of hard thinking were often required to track down those concurrency bugs and when we found them it was usually not at all clear that there were any unit tests we could have written that would have uncovered them.

When I quit Kenamea in order to spend some time hacking Lisp, I was still interested in unit testing and TDD. (There’s a reason that one of the first “practical” chapters in Practical Common Lisp is a simple test framework.) Shortly before I started work on PCL I read Kent Beck’s book Test Driven Development and did a few experiments with “pure” TDD which I recorded in a coding diary. The first was an implementation of an algorithm for generating starting positions for Fischer Random Chess, a version of chess where the pieces in the back row are positioned randomly, subject to a few constraints. Here’s the contemporaneous record of my attempt, complete with false starts and other brainos. (The code is in Common Lisp. Lispers may be shocked to see how little Lisp I knew then, a few months before I started work on PCL. And no guarantees that this code exemplifies good Lisp style or good anything else.)

The next problem I tackled with TDD was The “Impossible Book” problem, posed on the Test-driven Development Yahoo! group around that time. Again, I recorded my attempt as I went. I’m not sure these are great examples of TDD in action; I mention them merely as evidence that I have actually spent some time trying out TDD.

My Impossible Book attempt is also, perhaps, relevant to the current discussion since I was up against the same kind of difficultly as Ron Jeffries was while trying to write his Sudoku solver. Namely, I had no real idea how to solve the problem.

However—perhaps because there was no other infrastructure code to distract myself with—I was lucky enough to realize that no amount of testing was going to help me solve a problem I didn’t know how to solve. So instead I decided, as I say in my coding diary, that “TDD is about driving the implementation”. So I went off and read a bit about how to solve the problem and then used TDD to come up with an implementation. In this case I was lucky that there was already a readily accessible write up of a way to solve the problem. If there hadn’t been I would have had to do more research and probably would have had to learn some math in order to figure out how to apply it to the problem at hand. In other words, the thing that was going to speed me up was not trying harder with TDD but recognizing that I needed to step back and try something else.

Since those experiments with TDD, I’ve mostly been writing books so I haven’t been doing much work on production code. Working on my own coding projects, which tend to be exploratory, not very large, and special purpose (i.e. they only have to work for me) I have not found myself inclined to use TDD or even a lot of unit testing. I’m not claiming that that’s due to a principled appraisal of the benefits and drawbacks of TDD or unit tests; it just hasn’t felt like the problems I’ve had writing the software I’ve wanted to write would be fixed by having more unit tests nor that writing test-first would speed up my exploration.

And that, I guess, brings me back to how I was drawn into this conversation in the first place: I think testing, of all kinds, is an important part of serious software development. I think TDD is, at the very least, an interesting way to write software and I suspect that it might be a very good way to write some kinds of software. But I think the claim that TDD always speeds you up is just bunk. It may be that for the kinds of software Uncle Bob and Tim Bray write, in the kinds of organizations where they work, over the kinds of time scales they care about, it really does always speed things up. I’m happy to believe Uncle Bob when he says that he’s seen the benefits of TDD in his own and in others’ work.

But I also think that when Jamie Zawinski says that writing unit tests would have slowed down the first release of Netscape or when Donald Knuth says that he thinks he saved time by writing out TeX in a notebook without any testing at all until he had figured out how the whole program was going to work, those are data points that need to be accounted for, not dismissed with insults about “living in the stone age” and “being willfully ignorant”. Maybe Zawinski and Knuth are wrong about their own experience. Or maybe they were making different trade offs than Uncle Bob and Bray would chose to make. At any rate, I agree with Tim Bray when he says

If you read the comments around this debate, it’s increasingly obvious that we as a profession don’t have consensus around the value of TDD. Many of our loudmouths (like me for example) have become evangelists probably to the point of being obnoxious. But there remains a strong developer faction out there, mostly just muttering in their beards, who think TDD is another flavor of architecture astronautics that’s gonna slow them down and get in their way.

Maybe TDD’s detractors are, as Uncle Bob claims, analogous to the 19th century surgeons poo-pooing the benefits of washing their hands before surgery but I find Uncle Bob’s rhetorical stance of absolute certainty disconcerting and, ironically, anti-persuasive. That is, I thought better of TDD before I read his recent postings about it. But that’s a silly reason to accept or reject a practice that might do me some good. If I go back to writing production code, I’ll certainly resume my own contemplation on the best way to mix testing with development and wouldn’t be surprised if TDD found a place in my own practice.


Brooks’s Law and intercommunication: you talkin’ to me?

May 29, 2007

Surely everyone involved in developing software has heard of Brooks’s Law. First presented in the eponymous chapter of Frederick P. Brooks, Jr.’s classic The Mythical Man Month, it states: “Adding manpower to a late software project makes it later.” This “law” is much beloved by software developers as a handy bucket of cold water with which to cool the ardor of overly enthusiastic managers and executives. Lately, however, I’ve been thinking about Brooks’s Law and rereading The Mythical Man Month and I’m no longer as impressed with Brooks’s analysis as I once was. This is the third in a series of posts discussing some of the reasons why. The first post in the series discussed training costs and the second talked about sequential constraints.

In addition to training costs and sequential constraints, part of the justification for Brooks’s Law is the claim that as a team expands the cost of communication between team members grows faster than total productivity of the team. As with the issue of training costs, Brooks has a point — the number of possible pairwise communications paths between a team of n people is n(n – 1)/2; that’s a simple fact of math. If you then assume that every possible pair will need to spend a certain amount of time communicating or, equivalently, that the whole team will have to get together for meetings whose total length is determined by the number of people on the team, then it’s true that the number of person-hours spent communicating will grow as the square of the number of people while total productivity, again measured in person-hours, will only increase linearly. We can all see where that’s heading — pretty soon the amount of time spent on communication will be greater than the total amount of time available to work on anything at all and nothing else will get done. But how soon?

To take a concrete example, suppose we’ve got a six person team that we’re thinking of expanding to eight; should we be concerned that the increasing communication costs will eat up any additional productivity we might get from the two extra people? We can figure it out. Suppose that each pair on our team gets together for a pure-overhead, one-hour tête à tête every week. Assuming a week is five eight-hour work days, the whole team spends 30 person-hours on communication per week out of 240 person-hours worked, leaving 210 person-hours of productive work. What happens if we expand the team to eight? Each person will now spend seven hours a week in pairwise communication and the team as a whole will spend a total 56 person-hours a week communicating. But the team will also now be able to do a total of 320 person-hours of work, leaving 264 person-hours to be spent on productive work, or 54 more hours. This calculation does demonstrate Brooks’s larger point, that a “man month” is not a useful measure of productivity — if it were, then expanding a team from six to eight, a 33% increase in size, would likewise increase productivity by 33%, not the approximately 26% we actually get. But this example doesn’t justify, by itself anyway, a blanket claim that adding people to a project will always slow it down — the eight person team can, in fact, get more done than the six person team and therefore should finish the same amount of work sooner, all other things being equal. Of course all other things are not necessarily equal — training costs can reduce the initial productivity of new team members and it’s conceivable the sequential constraints introduce a long leg that can’t be reduced by adding people. In a later post I’ll discuss whether these three factors together might be enough to justify Brooks’s Law.

So, for this team, a jump from six to eight people probably won’t add more communication costs than the productivity of the new people. But obviously Brooks’s is right that eventually quadratic growth will outpace linear growth.1 With a bit of math, we can figure out exactly when the costs of communication start outpacing gains in the ability to do useful work for a given amount of per-pair communication overhead and a given number of hours worked per week. If we have a team of n people that work h-hour weeks and in which each pair spends c hours per week on communication overhead, then the cost to the existing team members of adding one person to the team is n×c because each of the n current team members will now be part of a pair with the newcomer and will have to spend c extra time tending to that pair’s communication needs. Meanwhile the newcomer will also spend n×c hours per week on pairwise communication which, when subtracted from the h total hours they’ll work each week, gives us the amount of non-communication work per week they’ll add to the team’s total capacity. When the amount of productivity lost from the existing team members is the same as the amount of productivity added by the newcomer there’s no point in expanding the team. So we can solve this equation for n:

to get this:

With this formula we can determine that for a team with one-hour per week of per-pair overhead, working 40-hour weeks, the the biggest the team can get without loosing more productivity than it gains, is 20.

But all of these computations may be beside the point, as they’re based on the assumption that communication has to be overhead. What if we had a team of six people that spend not one, but eight hours per day on pairwise communication because they spend all their time pair programing? If we add two people to that team, there is no change in time spent per person on pairwise communication — the only change is, assuming the team rotates partners, that each person will pair with each other person less often. But the communication time obviously can’t be all overhead — the only time anything gets done is when two people are communicating.

So how can this possibly square with Brooks’s analysis? One possibility is to join the ranks of the XP skeptics and simply deny that the pair programming team could possibly get anything done. I’ve had good experiences with pair programming though so I can’t buy that. I think the problem is with Brooks’s underlying assumptions. As I’ve mentioned previously, Brooks assumes that an n-person team will partition the task of writing whatever software they need to write into n pieces, each to be written by one person. To the extent that those pieces of software need to talk to each other, so do the people writing them and this communication is extra work on top of the base amount of work required to write the software. His arguments about training costs, intercommunication, and sequential constraints are all aimed at demonstrating that a task that a single developer could complete in x months will take n developers more than x/n months because the amount of work required for n developers is no longer simply x but x plus overhead.

But there’s another possibility. What if, as I suggested in my post about Sisyphus, n people don’t have more than x work to do but less than x because n people working together and communicating a lot are much more likely to discover a better solution than any one of them working alone? In that case, time spent communicating is not extra work but a way of reducing the total amount of work done.

1. One thing to note about the growth of intercommunication costs is that it is quadratic, not — as some writers have described it — exponential. Quadratic growth is faster than linear, for sure, but nowhere near as fast as exponential. Populations with no limits on their growth — bacteria in Petri dishes or rabbits in Australia — grow exponentially. If communication costs did grow exponentially with the size of the team, then a team would go from spending just slightly over half it’s time on communication to being able to do nothing but communicate, just by adding one person. One author who should certainly know better is Steve McConnell who described the growth of communication paths as “exponential” in Software Estimation, (p. 57). In fact he did know better — in his earlier book, Rapid Development, he described the growth, correctly, as “multiplicative” (p. 311).

Brooks’s Law and sequential constraints: one damn thing after another

May 22, 2007

Surely everyone involved in developing software has heard of Brooks’s Law. First presented in the eponymous chapter of Frederick P. Brooks, Jr.’s classic The Mythical Man Month, it states: “Adding manpower to a late software project makes it later.” This “law” is much beloved by software developers as a handy bucket of cold water with which to cool the ardor of overly enthusiastic managers and executives. Lately, however, I’ve been thinking about Brooks’s Law and rereading The Mythical Man Month and I’m no longer as impressed with Brooks’s analysis as I once was. This is the second in what I expect will be a series of posts discussing some of the reasons why. The first post in the series discussed training costs.

As part of his “demythologizing of the man-month” (p. 25) Brooks points out that developing software is subject to what he calls “sequential constraints”. Brooks actually makes two points about sequential constraints, but he doesn’t draw a particularly clear distinction between them in his exposition, so I’ll start by teasing them apart. They are:

  1. All tasks, including software development, have some sequential constraints on their subtasks that determine the minimum time in which the whole task can be completed.
  2. Even if the rest of the work can be parallelized, communication needed to coordinate work can act as a sequential constraint, putting a lower bound on the time needed to complete a task.

Point one is a simple matter of logic. Virtually every task, from harvesting a field of crops, to having a baby, has some sequential constraints on some of its subtasks that determine that certain subtasks can only be done after other subtasks are complete. It’s impossible to complete the whole task in less time than the time it takes to do the longest sequential chain. By definition, subtasks that are not sequentially constrained can be done in parallel and so more people working on them at the same time will get them done sooner than fewer people.

Tasks vary in the both the nature and extent of the sequential constraints that apply to their subtasks. Brooks gives harvesting crops as an example of a task with very few sequential constraints and bearing a child as one that nothing but a long sequentially constrained chain. It’s worth noting, however, that all real-world tasks are sequentially constrained at some level — even harvesting a field of crops, for instance, requires that someone get to the farthest corner of the field, harvest what’s there, and bring it back. No matter how many field hands you hire and no matter how minuscule the part of the field each is responsible for there’s still no way to harvest the whole crop faster than that long leg could be completed.

For practical purposes, however, Brooks is right: harvesting crops is almost entirely parallelizable and bearing a child is almost entirely not. Before we get to how communication itself can act as a sequential constraint, let’s consider another task which is more sequentially constrained than harvesting crops but less so than having a baby, namely baking a cake. Consider for instance this simple cake recipe:

  1. Pre-heat the oven to 350°.
  2. Prepare the cake pans — greasing, lining, and flouring.
  3. Sift together flour, baking soda, and salt.
  4. Cream the butter, shortening, and sugar until light and fluffy.
  5. Add dry ingredients to butter/shortening/sugar mix.
  6. Mix in three eggs.
  7. Pour batter into cake pans.
  8. Bake for 25 to 30 minutes.
  9. Cool on racks for 10 minutes.
  10. Remove from pans and continue cooling.

As with most recipes, there are both opportunities for parallelism and unavoidable sequential constraints. If you had a three cooks in the kitchen one of them could prepare the cake pans while another sifts together the dry ingredients and a third creams the butter, shortening, and sugar. After that, the next three steps, up to pouring the batter into the cake pans, while sequentially constrained relative to each other, could be done in parallel with the oven heating. Thereafter, everything is sequentially constrained. No matter how many cooks you have, you have to heat the oven before you bake the cake and bake the cake before it can cool. Thus there’s no way to decrease the total time it takes to make a cake below about 45 minutes.

Now let’s consider how software development is sequentially constrained. As anyone who has written software knows, there are sequential constraints but where do they come from? There are certainly no physical constraints such as the one that keeps a baker from pouring batter into cake pans before the batter has been made. If, somehow, we knew at the beginning of a development project all the lines of code that needed to be written, we could type them in any order we wanted — the software would work just as well in the end. But the notion that we could know in advance all the code that needs to be written and type it like we were taking dictation is just crazy. Programming isn’t primarily a typing problem, it’s a thinking problem. And thoughts need to be thought in the proper order.

In fact, the only way to figure out how a software system ultimately fits together is to build it. In order to know, in detail, how part X is going to work we need to know how part Y, with which it interacts, is going to work. And the only way to know how Y is going to work is to build it. It may be that we can completely build X and then build Y or we may need to alternate — build a bit of X in order to develop enough information to build a bit of Y from which we learn enough to build another bit of X, and so on. It might also be equally possible to start by building X and then build Y or to start with Y and then build X. But however we do it, the flow of information about the system we are building are the inherent sequential constraints we operate under. Note that this has nothing to do with communication — even if the system were being built by a single developer these constraints would still constrain the order in which various parts of the system could be built.

Now, keeping in mind these inherent sequential constraints, let’s consider Brooks’s second point, that the need to communicate can itself act as a sequential constraint. This is the software development equivalent of Ahmdal’s Law from parallel computing which says that no matter how much you can parallelize a computation, it’ll never complete faster than the time it takes to combine the results of all the parallel computations. In both software development and parallel computing this is because communication is inherently sequential. If ten people — or ten CPUs — each have six minutes worth of information to convey to each other, it’s going to take at least an hour of elapsed time no matter how you slice it; ultimately each person is going to have to spend six minutes “transmitting” their information and fifty-four minutes “receiving” information from the other nine people.

To see how this effect plays out, imagine we have an idealized software development task whose coding can be partitioned among however many developers we like but for every ten hours a developer is going to spend coding, they need to spend one hour writing down what they’re going to do for the benefit of the rest of the team and everybody has to read everyone else’s notes. In other words, before each ten hours’ worth of coding, a developer spends an hour writing an email about what they are about to implement and sends it to all the other developers. Then they have to read the other developers’ emails, spending an hour to absorb each one. After all that communication, the developers can each code for ten hours. For simplicity, we’ll assume that even a developer working alone would spend the hour writing notes for themself documenting what they plan to do in the next ten hours.

Suppose the total coding time needed to develop the system is 100 person-hours. A single developer could do it in 110 hours, ten chunks of an hour of note writing followed by ten hours of coding. Two developers could do it in five chunks of work with each each chunk consisting of twelve hours of work: an hour writing notes, an hour reading the other developer’s notes, and ten hours coding. Thus for the team of two, the total elapsed time would be 60 hours, of which 10 would have been spent on communication. Five developers could complete the project in only two chunks but each chunk would be fifteen hours: one hour writing, four reading, and ten coding. Thus their elapsed time would be 30 hours with 10 hours spent on communication. Ten developers would be done in 20 hours elapsed time — ten hours of development after ten hours of communication.

Even if we could scale down the communication cost proportionally, so less than ten hours of individual work requires a proportionally smaller amount of email writing and reading, the elapsed communication time still stays at ten hours: twenty developers would spend a half-hour writing their emails and nine and a half hours reading nineteen emails before coding for five hours. Indeed, as the number of developers approaches infinity, the amount of time spent coding approaches zero as does the amount of time each developer spends writing their own email while the amount of time spent reading the infinite number of infinitesimally short emails from other developers approaches ten hours and the project as a whole still takes a minimum of ten hours to complete. Thus even when there are no other sequential constraints — when we assume that an infinite number of developers can each be given an infinitesimally small part of the project to work on in isolation — communication remains the one activity that must be performed sequentially.

In real software projects, of course, things are more complicated. The inherent sequential constraints — those that would affect a single developer working alone — interact with communication induced constraints in all sorts of complicated ways. For one thing, if we assume — as Brooks seems to — that the overall task is partitioned into subtasks, each to be developed by a single developer, then the way we do the partitioning can have dramatic affects on the amount of communication needed. If we split the system at its natural joints, then communication will be minimized — if subsystems are naturally decoupled then developers can work on their bit for a while, developing lots of information about how their part of the system works, which only they need to know, and just a little bit of information that they need to share with other developers. On the other hand, if the partitioning is poor, each developer’s part of the system will depend on many details of other parts and the developers will either need to communicate much more often or, more likely, will all go off in their own directions for a bit too long and then discover, when they compare notes, that they need to backtrack and rework things in order to make everything fit together.1

Another issue, which Brooks doesn’t mention, is that the need to communicate can stall productive work. One of the idealized aspects of the hypothetical project above is that the developers work in perfect lockstep — everyone communicates and then works for exactly ten hours and the cycle repeats. At no point is anyone stalled waiting for someone else. In real projects, some subtasks will be bigger than others leaving developers whose pieces happens to be smaller waiting after they’ve finished their work to communicate with developers whose pieces are larger. Every hour that they spend waiting is an hour that gets added to the total number of person-hours it takes to complete the project.

That all said, there’s nothing that says the only way to divide up a task is by partitioning it into pieces that are each implemented by a single developer. In fact there are all sorts of reasons, which I’ll talk about in a later post, that that might be a bad idea. For now, let’s just note that if we could avoid a strict upfront partitioning, and could let developers share ownership of the system as a whole, working together frequently and sharing ideas about how it all fits together, they could probably much more closely emulate the order of development that we would see if we watched a single developer build the whole system, constrained only by the inherent constraints of needing to build enough of X in order to know enough to build Y and discovering, as they go along, enough bits that can be naturally carved off and done in parallel to keep everybody busy.2

So how does all this relate to Brook’s Law? In the concluding paragraph of the chapter, right after he has stated his Law, Brooks goes on to say:

The number of months of a project depends upon its sequential constraints. The maximum number of men depends upon the number of independent subtasks. From these two quantities one can derive schedules using fewer men and more months. … One cannot, however, get workable schedules using more men and fewer months. (pp. 25-6)

The last sentence is only true if the number of workers currently on the project is sufficient to take advantage of all the opportunities for parallelism. For instance, suppose we have a project consisting of forty individual tasks, each of which will take a weeks’s worth of work by one person. Now suppose ten of those tasks are inherently sequentially constrained while the other thirty tasks can be done at any time, in any order. Because of the ten sequentially constrained tasks, the project can’t be completed in any less than ten calendar weeks. But suppose the project has been assigned to a two-person team. It will take them twenty weeks to do the whole project, ten weeks longer than the minimum. Clearly in this case, we can get “workable schedules using more men and fewer months” by adding one or two people to the team. A team of three would finish in a bit over thirteen weeks and four would finish in the minimum time of ten weeks. To say that we can’t reduce calendar time because of sequential constraints would only be correct if we had originally assigned the project to a four person team.

In general, given that Brooks’s Law is talking about late projects, that is, ones we badly underestimated in the first place, what’s the likelihood that our estimate of how many people we needed was exactly right? The real question, if we’re concerned about sequential constraints, is whether or not there’s work that could be done in parallel. Sometimes there is and sometimes there isn’t and assuming that there never is is just as foolish as assuming that there always is.

1. In other words, the only thing worse than paying the costs of communication is not paying the costs of communication. Because we will pay them eventually, with interest.

2. Obviously if the team pair programs then the partitioning problem is made quite a bit easier as n people need only n/2 tasks to keep everyone busy, rather than n.

Brooks’s Law: training costs, but not as much as you might think

May 17, 2007

Surely everyone involved in developing software has heard of Brooks’s Law. First presented in the eponymous chapter of Frederick P. Brooks, Jr.’s classic The Mythical Man Month, it states: “Adding manpower to a late software project makes it later.” This “law” is much beloved by software developers as a handy bucket of cold water with which to cool the ardor of overly enthusiastic managers and executives. Lately, however, I’ve been thinking about Brooks’s Law and rereading The Mythical Man Month and I’m no longer as impressed with Brooks’s analysis as I once was. This is the first in what I expect will be a series of posts discussing some of the reasons why.

When Brooks says that adding manpower makes a late project later, he doesn’t specify what he means by later. Later than it already is? Almost certainly, but so what? Later than your new wildly optimistic estimate? Probably, but again not all that interesting. The slightly paradoxical interpretation that makes Brooks’s Law such a perennial on amusing quotation lists is: later than it would have been if you had just left well enough alone.

Of the various reasons Brooks gives in the chapter “The Mythical Man Month” for projects running out of calendar time, the only one that has specifically to do with adding staff to an existing project is the cost of training the added staff. There are other costs associated with having a bigger team that such as potentially increased intercommunication costs and the need to repartition tasks. I’ll discuss those costs in later posts but for now I’m concerned only with whether Brooks’s own analysis of the costs of training holds water.

If we were to take Brooks’s Law as literally true, then we would have to believe that the costs of training new staff will always be higher than any capacity for productive work they might eventually develop. That seems unlikely. However, Brooks’s Law only refers to “late” projects so perhaps there’s something about being late that makes it true. Unfortunately, he doesn’t define “late” any more than he defines “later” so if we want to apply Brooks’s Law wisely we’re on our own — we need to ask, when can we get more done by adding staff than by not?

Much more often than Brooks lets on, it seems. In the section “Regenerative Schedule Disaster” Brooks uses a hypothetical project, originally estimated to be twelve person-months of effort and assigned to a three person team, to demonstrate how training costs affect our ability to speed up a project. In his scenario the project has been divided into four milestones, each of which should be completed in one calendar month by the team of three, i.e. three person-months per milestone. Unfortunately it takes the team two calendar months, or six person-months, to finish the first milestone, so there are only two months left to complete the remaining three milestones. Brooks then considers two sub-scenarios — one where only the first milestone was mis-estimated, in which case there are nine person-months worth of work left and two months in which to do it, and another where the underestimation was systematic so the three remaining milestones are all, like the first, six person-months of work leaving eighteen person-months of work. The question he then poses is, what happens if a manager attempts to get the project finished in the remaining two calendar months by adding staff.

In the first sub-scenario, a manager who ignores training costs would calculate that they need four and a half people to do nine person-months of work in two months. Rounding up to five, subtracting the three they’ve already got, and they add two people. In the second scenario, eighteen divided by two is nine, subtract the three they’ve got, and they’d need to grow by six. Brooks then analyzes the first sub-scenario, making the rather conservative assumption that it’ll take one month of full-time work by one of the existing team members to train the two newcomers before they’ll be able to do any work. Under that assumption, only two people will do productive work during the third month so only two more person-months of actual work will be done, leaving seven. In the fourth, and final, month, the new people will start contributing and the trainer can get back to real work but it’s too late — they’ll get five person-months worth of work done but with seven left to do the schedule is blown.

But there’s another way to look at it. With the two newcomers, the team managed to complete a total of thirteen person-months worth of actual (non-training) work, or almost 87% of the originally planned functionality (assuming the revised estimate of fifteen person-months for the whole project is correct.) What would have happened if they had heeded Brooks’s Law and just kept going with the original three-person team? They’d have completed only twelve person-months, or 80% of the originally planned effort. Or, if it’s more important to deliver 100% of the functionality as soon as possible, the original team would have needed another month, blowing the schedule by 25% while the augmented team would only need an additional two-fifths of a month, or about 10% over the original schedule.

In Brooks’s second sub-scenario, where the actual project size is assumed to be twenty-four person-months, the benefits of adding staff are even more pronounced. Assuming the same one-month of full-time training, the augmented team finishes almost 71% of the originally planned effort in four months compared to only 50% by the original team. Or they can finish the whole project in a bit less than five months total, extending the original schedule by about 20%, compared to the 100% by which to the original team would blow the original schedule.

The problem is not that adding staff to the project didn’t help; it’s that it didn’t help quite enough. You might ask, why not account for the training costs when figuring out how many new staff are needed? Brooks briefly considers that idea and rejects it on the grounds that the seven person team needed in the first sub-scenario to finish the remaining seven person-months worth of work after training would be too different in kind from a five person team for it to be feasible. That may be true but the question remains, what’s the alternative? Brooks considers attempts to finish the project on the original four-month schedule “disastrous” and recommends that we should instead “reschedule” or “trim the task”. Both of those are probably wise strategies but even with Brooks’s conservative assumptions about training time, the expanded team would still get more done needing either less of a schedule slip or less trimming of functionality.

At any rate, it’s not the case, in either sub-scenario, that training costs on their own would cause the project to finish later with additional staff than it would have without. Brooks does, however, make one important point when he says:

Notice by the end of the third month, things look very black. The [second] milestone has not been reached in spite of all the managerial effort. The temptation is very strong to repeat the cycle, adding yet more manpower. Therein lies madness. (pp. 24-5)

It is important not to lose one’s nerve. If you’ve already used up two months of a four-month schedule, it’s going to be queasy-making to reduce your productivity by a third for another whole month. If you do, you’ve got to stick with it to reap the benefits as your new workers get up to speed. It also suggests two bits of tactics. One: make sure you add enough new staff. If you’re going to take the hit of losing the output of one or more of your currently productive workers to training you want to make sure you get as big a return on that investment as possible — add as many people as you can afford and as you think can be trained in a reasonable amount of time. Second, make sure you invest enough in training. In his own reappraisal of Brooks’s Law Steve McConnell called Brooks’s assumptions about training costs “absurdly conservative” and they may be. But notice that even with those conservative assumptions the investment can still pay off quickly. It can be tempting to try to cheat, adding staff without explicit training, hoping they’ll somehow get up to speed on their own. If it works, great, but more likely they’ll just nibble away at the productivity of the current staff without ever becoming productive enough to offset the cost. Better to plan conservatively and then end the training ahead of schedule if they’re ready to get to fully productive work sooner than planned.

One woman can’t have a baby in nine months

May 10, 2007

As all software developers know, nine women can’t have a baby in a month. Or in Fred Brooks’s more elegant phrasing: “The bearing of a child takes nine months, no matter how many women are assigned.” (The Mythical Man Month, p. 17) The point, of course, is that some tasks are, as Brooks would say, “sequentially constrained”. They’re going to take a certain amount of time no matter what — the time can’t be reduced by having more people work on them.

On the other hand, is it actually the case that one woman can have a baby in nine months? Suppose we have just been put in charge of Project New Baby that must produce a brand new baby in nine months. How should we staff the project. Easy enough — nine women can’t have a baby in a month, right? No point in overstaffing so we’ve just got to find a couple, make sure they’re both fertile and interested in having a kid, and we’re good to go. But wait a sec’, what’s the chance they’ll miss our nine month deadline by more than a month? Pretty high, it turns out.

Typically a couple trying to get pregnant has about a 16% chance in any given month. Once they’ve conceived, there’s, sadly, about a 15-20% chance of miscarriage, usually within the first three months. So the chance our couple will produce a baby nine months from now is only .16 × .85 or 13.6%. If we wanted to we could compute the average time we should expect it to take for one couple to have a baby, using math similar to that in an earlier post. But suppose the deadline is hard — we really, really need to finish Project New Baby in nine months — is there anything we can do?

Sure. Throw bodies at it. While a single couple has a 86.4% chance of missing our deadline, if we had two couples, the chance that they’d both miss it is only .8642 or about 74.6%. With three couples, the chance of blowing it is down to 64.4%. To figure out how many couples we need to have a P chance of hitting our deadline, just plug P into this formula:

Of course, this could get expensive if we need to be really certain of hitting that deadline — to have a 90% chance of hitting it we’d need sixteen couples. But depending on how important Project New Baby and it’s deadline are, it might be worth it.

So what in software is like making babies? Let’s take a look at how Brooks himself tied making babies to making software:

The bearing of a child takes nine months, no matter how many women are assigned. Many software tasks have this characteristic because of the sequential nature of debugging. (p. 17)

Unfortunately, I haven’t been able to find anywhere where he explains what he means by “the sequential nature of debugging” but I can see how debugging is like having a baby. And not that they both can be incredibly painful and that you have a great feeling of relief when you’re done. The similarity that I see is that the time it takes to find a bug has a large random component, like trying to conceive a child. Basically when you’re looking for a bug, there’s some probability p that you’ll find the bug for each unit of time that you spend looking, just like a couple has a 16% chance of getting pregnant for each month they spend trying. If you’re a skillful debugger and know your code really well p will be higher but there’s always a random element — if you go down the wrong path it can take you a while to realize it and all that time is lost whereas if you had tried a different path first, you might have found the bug right away. This is why it’s almost useless to try to estimate how long it will take to find a bug. You could find it in the next five minutes or five weeks from now. Once you find the bug of course you also have to fix it but that tends to be less random — unlike a pregnancy, which always lasts about nine months after conception, different bugs will require more or less work to fix, but once you’ve found it you can usually characterize how big a job it’ll be. And for many, if not most, bugs finding them is the hard part — once you’ve well and truly tracked them down, the fix is often trivial.

All of which suggests we can use the same technique to speeding up debugging as we did on Project New Baby — throw bodies at it. Suppose we’re ten days from the end of a release and there’s one last serious bug to be tracked down. Suppose my chance of finding it is 10% per day. The chance that I won’t find it in the next ten days is (1 − .1)10 or about 35%. But if there’s someone else who can also look for it — say a pair programming partner — who also has a 10% chance of finding it per day, and we both work at it separately. Then the chance that the bug will remain at large by the end of the release drops to 12%. If we can throw even more developers at it, then the chances of the bug escaping drop even more: 4% chance with three developers, 1% with four, 0.5% with five.

Obviously, to be able to take advantage of this strategy requires having multiple developers with enough familiarity with the code to be able to pitch in. Which seems to me a strong argument for practices such as pair programming and collective code ownership. An interesting side question is whether, if you do have developers to throw at debugging in this way, it is better for them to work independently or should they pair up for the debugging on the grounds that two heads are better than one?

If Sisyphus had only had a partner

May 8, 2007

While working on another blog entry (still in progress) about Brooks’s Law, I got to thinking about pair programming and how it’s possible that two people working together, sitting at one computer, can be more productive than the same two people working on their own and combining their work. I certainly believe they can, based on my own experiences with pair programming. But after immersing myself in Brooksian notions of how communication costs quickly eat up all available productivity it seems a bit of a paradox. To the extent that writing software is like carrying rocks up a hill — and doesn’t it often feel that way? — here’s an explanation.

Suppose you have a hundred heavy rocks that you need to carry up a hill. They’re not so heavy that you can’t do it but they’re heavy enough that moderately often you’ll lose your grip and the rock will roll back to the bottom of the hill. Let’s say on each attempt to carry a rock up the hill there’s a 70% chance you’ll lose your grip. Assume that when you don’t drop the rock it takes five minutes to carry it up the hill and a minute to walk back down. First question: how long will it take you to get all the rocks to the top of the hill? Obviously in practice it depends on how often that 70% chance of the dropping the rock actually bites you, but we can figure out an expected value. If the drops are randomly distributed — sometimes near the bottom of the hill and sometimes near the top — you’ll lose an average of three minutes per drop. But once you drop a rock you have to start all over again with it and there’s a chance you’ll drop it again. Thus the amount of time you should expect to spend on each rock is six minutes plus the sum of this infinite series:

Add that seven minutes to the six minutes to get it to the top of the hill without dropping and we get an average of thirteen minutes per rock, or 1,300 minutes for all one hundred rocks.

Now suppose you had a partner. Assuming there’s room for two people to carry rocks at the same time, one way to reduce the time it takes to get all the rocks to the top of the hill would be to simply each carry fifty rocks — the 1,300 minutes would be cut in half, to 650 minutes. But there’s another possibility — since the rocks are just a bit too heavy for one person to manage 100% reliably perhaps the two of you working together would be strong enough to never drop a rock. In that case, you could carry all hundred rocks up without dropping any and the whole job would take only 600 minutes, even better than splitting the work.

Of course if the chance of one person dropping a rock was lower, then working separately might be a better bet. In fact we can figure out exactly what probability makes it better to work separately or together by solving this inequality for p:

The numerator of the left hand side represents the expected time it’ll take for one person to get one rock to the top if it takes x minutes with no drops. We divide by two to account for the fact that there are two people working at it. The right hand side represents the time taken with both folks working together and never dropping a rock. After some algebra the xs all go away and it turns out that when the probability of dropping a rock is greater than ⅔ it’s better to pair up than to work separately.

Now, a ⅔ chance may seem fairly high but it’s worth thinking about where that probability comes from. Let’s consider how a ⅔ chance of dropping the rock over five minutes relates to the chance that we’ll drop it in any single minute. To back out the per-minute chance of dropping, given the total probability of dropping and the number of minutes, we start by recognizing that the probability of dropping is equivalent to one minus the probability of not dropping. And to not drop for five minutes we need to not drop for one minute, five times in a row. More generally, to not drop for m minutes, we need to not drop for one minute, m times in a row. If h is the probability of holding (i.e. not dropping) for one minute, and the probability of holding in any one minute is independent of any other minute (i.e. dropping is more or less random and not the result of fatigue), then the combined probability of m minutes is hm. Thus if D is the probability that we’ll drop a rock any time in m, then we can figure out h, the probability that we can hold a rock for a minute, and from there, trivially, d, the probability that we’ll drop it in any one minute, for a given D and number of minutes m as shown here:

Plugging our ⅔ chance and 5 minutes into this formula we find that that a ⅔ chance of dropping over five minutes is equivalent to about a 20% chance of dropping in any single minute. If we want to find out the probability that we’ll drop a rock over a m minute trip, given d, we can use this formula:

Or perhaps more to the point we can solve this inequality:

to determine the relationship between d and m that determines when the total probability of dropping is greater than the ⅔ chance that makes it worthwhile to pair up rather than working separately:

With this formula we can see that if it only took us three minutes to climb the hill, we could live with up to a 31% chance of dropping per minute before pairing would make sense. But if it took us 20 minutes, then we’d do well to pair up even if every minute we had a 95% chance of keeping our grip.

So is developing software like carrying rocks? I’d argue that in many ways it is. Programming requires keeping a bunch of things in mind and if you lose your mental grip on any of them for a moment you either have to backtrack to re-figure out how things fit together or, worse yet, you proceed with a faulty understanding and introduce a bug which later requires a lot of time to track down and fix. In fact programming is in some ways worse than carrying rocks because the cost of a momentary slip of concentration can be much more than simply the equivalent of a rock rolling back to where you started. A bug that you create a few hours into a programming session may take many hours or even days to track down and fix. Luckily pairing can help there too — while one partner is focusing their mind on the next thing the other partner’s mind may linger for a moment and have a “Wait a sec’” moment that catches a bug before it gets too far away, the equivalent of catching a dropped rock before it rolls all the way back down the hill. Or when bugs do get in, a pair can often find them faster than a single programmer, much the way two people would be able to find a dropped rock if it didn’t just roll back to the bottom of the hill but bounced off in some random direction into thick weeds.

Software estimation considered harmful?

April 26, 2007

A few weeks ago I was in the midst of reading Steve McConnell’s Software Estimation: Demystifying the Black Art when I had the conversation with my friend Marc that I have written about previously. Marc, as I mentioned before, is the founder of and Chief Product Officer at a small startup called Wesabe. When I told him what I was reading, he asked, “Do you believe it?”

“Sure,” I said. By which I meant nothing in the book had struck me as patently bogus. A lot of it is good sense about the limits of estimation, the relation of estimation to planning, and why estimation is so hard. Parts II and III of the book present specific estimation techniques that, assuming one had the relevant inputs and historical data, seem likely to be capable of producing fairly accurate estimates. Now, the descriptions of some of these techniques made me think — wow, if that’s what it takes to produce good estimates, no wonder we all muck along with crappy ones. But it did for the most part seem like the kind of thing we Serious Software Professionals™ should be doing.

Marc — it turns out — is far more skeptical about the whole enterprise of software estimation. He tells me that at Wesabe they never make schedule estimates. He manages his developers, as he has explained in a blog entry, by trying to get them excited enough about the features he thinks should be added to Wesabe that they decide to go ahead and add them. Or they may get excited about their own ideas and add them instead. Marc retains final authority over what gets added to the product and a Wesabe developer who consistently gets excited about developing things that Marc refuses to allow into the product should probably make sure their resume is up to date. But he never asks them for estimates. He does encourage his developers to spend most of their time working on things that they can finish quickly and get into the product, but that seems to be as much about what he thinks most likely to keep his developers happy as anything else. When they need to, Wesabe developers will tackle bigger projects, still without estimating how long they will take. Marc’s point of view, I take it, is that the only reason to do these big projects is because you have to, and if you have to, it doesn’t really matter how long it’s going to take.

Now, Marc’s a smart guy and he’s been managing software developers for as long as I’ve known him (more than a decade) and I know he thinks a lot about how he does what he does. On the other hand, Steve McConnell’s also a smart guy whose books I’ve been a big fan of for about as long. So, how to reconcile these two points of view? Is Marc’s approach only tenable in a startup? Or maybe McConnell’s approach to estimation is only worthwhile in big organizations, that are doing more or less the same thing over and over again.

So I returned to reading Software Estimation with a new question in mind: Why estimate? The nearest McConnell comes to an answer is in section 1.7 “Estimation’s Real Purpose”, where he gives this definition of a good estimate:

A good estimate is an estimate that provides a clear enough view of the project reality to allow the project leadership to make good decisions about how to control the project to hit its targets.

I think the key word in McConnell’s definition is “targets”. The reason Marc can get away with not estimating is because he’s found a way to manage without setting targets. So the question “Why estimate?” is better expressed as “Why set targets?”

Sometimes we set targets in order to convince others, or ourselves, that something can be done. We may set targets to inspire ourselves to do more, though it’s not clear that’s a winning move, and even less so when managers set a target to “inspire” the folks who work for them. (See DeMarco and Lister’s Peopleware, chapter 3 and the discussion of Spanish Theory managers.) We may also set targets to give ourselves a feeling of control over the future, illusory though that feeling may be. After the fact, a target hit or missed can tell us whether or not we did what we set out to do. However if we missed a target, we can’t know whether that’s because the target was unrealistic or because we didn’t perform as well as we should have. Setting and hitting targets does make it look like we know what we’re doing but we need to keep in mind that targets rarely encompass all the things we care about — it’s much easier to set a target date for delivering software than a target for how much users will love it.

If the goal is simply to develop as much software as we can per unit time, estimates (and thus targets), may be a bad idea. In chapter 4 of Peopleware, DeMarco and Lister discuss a study done in 1985 by researchers at the University of New South Wales. According to Peopleware the study analyzed 103 actual industrial programming projects and assigned each project a value on a “weighted metric of productivity”. They then compared the average productivity scores of projects grouped by how the projects’ estimates were arrived at. They found that, as folks had long suspected, that programmers are more productive when working against their own estimates as opposed to estimates created by their boss or even estimates created jointly with their boss, averaging 8.0 on the productivity metric for programmer-estimated projects vs 6.6 and 7.8 for boss-estimated and jointly-estimated. The study also found that on projects where estimates were made by third-party system analysts the average productivity was even higher, 9.5. This last result was a bit of a surprise, ruling out the theory that programmers are more productive when trying to meet their own own estimates because they have more vested in them. But the real surprise was that the highest average productivity, with a score of 12.0, was on those projects that didn’t estimate at all.

There is, however, one other reason to estimate: to coordinate our work with others. The marketing department would like the developers to estimate what features will be included in the next release so they can get to work writing promotional materials. Or one developer wants to know when a feature another developer is working on will be ready so he can plan his own work that depends on it. Note however, that in cases like this, estimates are really just a tool for communication. Marketing needs to know what’s going to end up in the release and the developers, by virtue of being the ones building it, have the information and somehow that information has to be communicated from the developers to the marketers. But there are lots of ways that could happen. In a small company it might happen for free — everyone knows what everyone else is working on so the marketers will have as good an idea as anyone what’s actually going to be ready for the release. If water-cooler conversations are insufficient, then marketing and development could meet to talk about it on a regular basis. Or the developers could maintain an internal wiki about what’s going on in development. Some of these methods may work better than others but it’s not a given that using estimates is always the best way.

To decide whether estimates are a good way to communicate, we need a way to compare different methods of communication. I’d argue that all methods of communication can be modeled, for our present purposes, with the following five parameters:

  • Preparation time required
  • Communication time required
  • Rate at which information is conveyed
  • Rate at which misinformation is conveyed
  • Other benefits and costs

The idea is that communication happens in this pattern: one or more people spend some amount of time preparing to communicate. This would include activities such as thinking, writing, estimating, etc. Then the communication proper happens, which takes some time. This may require time from both the sender and the receiver (conversations, meetings, presentations, etc.) or only the receiver (reading an email, looking at a web site).

After the communication is complete, some amount of information has been conveyed and also, sadly, some amount of misinformation. The misinformation may arise from faulty preparation, from misstatements by the sender, or from misunderstandings by the receiver. Obviously different methods of communication will be able to convey more or less information in a given amount of time and will be more or less prone to miscommunication.

Finally, different methods of communication can have benefits beyond the information conveyed and costs other than the time spent and the misinformation conveyed. For instance, chatting around the water-cooler may build team cohesion while highly contentious meetings may have the opposite effect. Another important kind of benefit is that the preparation and communication phases may itself improve the communicators’ understanding of what they are communicating about. For example, writing clearly on any topic invariably forces the writer to clarify their own thoughts and the give and take in a well-run meeting can help bring out and refine what were previously only amorphous notions.

Now we can look at estimation as a communication tool according to this model. The preparation time for good estimates is fairly high. This is why so many developers try to avoid estimating or give it short shrift. The communication time is low — since an estimate distills things to the essence of “these features by this date” or “this much effort for that much functionality” it can be quickly conveyed. However, exactly because estimates do distill things, they are a low bandwidth form of communication. Without a lot of other communication about what exactly the features are going to look like, a list of features and the dates when they will be done, leaves out a lot. Worse yet, estimates are notoriously prone to conveying misinformation, either because the estimate is inaccurate or because an accurate estimate is misunderstood. McConnell devotes a whole chapter, “Estimate Presentation Styles”, to discussing how to present estimates so they will be less likely to be misunderstood, but no presentation style can help the poor estimator who’s giving an estimate to an audience that hears “2 to 4 weeks” as “2 weeks”.

When it comes to other costs and benefits, it’s hard to say. If we assume for the sake of argument that it is possible to make good estimates, does the very act of preparing the estimate have its own benefits? Certainly, making a good estimate requires identifying all the work to be done, so a team that has gone through the exercise of estimating may identify all the work that actually needs to be done sooner than a team that hasn’t, which may allow the work to be done more efficiently. Another potential consideration is that in a company where non-developers expect developers to be able to provide reliable estimates, then meeting those expectations has the social benefit of giving the rest of the company confidence in their development team.

On the other hand, there’s no guarantee of obtaining those benefits. Estimating badly certainly has bad ancillary costs. Leaving aside the consequences in terms of the misinformation it generates, it’s just no fun to estimate when you know your estimates are bad. Either you feel guilty because you believe it’s possible to do well and that you should be better at it or you think it’s actually impossible and therefore whoever is asking you for the estimate is wasting your time. And if good estimates can increase the company’s confidence in their developers, bad estimating can destroy it. Or, worse yet, developers can be unfairly deemed unreliable because the rest of the company expects more precision in estimation than is actually possible. According to McConnell, at the beginning of a project even the best estimators can be up to 4x off in either direction. Which means unless folks can accept an initial estimate of “from three months to four years” and a promise to refine it over time, developers are bound to disappoint.

So we have: High preparation cost. Low communication costs. Low bandwidth with high potential for misinformation. Ambiguous ancillary costs and benefits. Clearly, if you’re going to estimate, you’d better do it well and accept the costs that entails. But if you doubt it’s possible to do well, either in principle, or in your particular situation, then it can be useful to think about other ways to make sure the necessary communication happens.