Archive for October, 2009

C++ in Coders at Work

October 16, 2009

One of the topics I asked most of my Coders at Work interviewees about was C++. I am not an expert, or even a competent C++ programmer and recognize that my own opinions about C++ are not well-informed enough to be worth much.1 But C++ fascinates me—it’s obviously a hugely successful language: most “serious” desktop apps are still written in C++ despite the recent inroads made by Objective C on OS X and perhaps some C# on Windows; the core of Google’s search engine is written in C++; and C++ dominates the games industry. Yet C++ is also frequently reviled both by those who never use and by those who use it all the time.

That was certainly reflected in the responses I got from my Coders interviewees when I asked them about it. Jamie Zawinski, as I’ve discussed recently, fought tooth and nail to keep C++ out of the Netscape code base (and eventually lost). Some of that was due to the immaturity of C++ compilers and libraries at the time, circa 1994, but it seems also to have to do with his estimation of the language as a language:

C++ is just an abomination. Everything is wrong with it in every way. So I really tried to avoid using that as much as I could and do everything in C at Netscape.

Part of Zawinski’s issue with C++ is that it is simply too complex:

When you’re programming C++ no one can ever agree on which ten percent of the language is safe to use. There’s going to be one guy who decides, “I have to use templates.” And then you discover that there are no two compilers that implement templates the same way.

Note that Zawinski had started his career as a Lisp programmer but also used C for many years while working on Netscape. And he later enjoyed working in Java. So it’s not that C++ was either too high-level or too low-level for him or that he couldn’t wrap his head around object orientation.

Joshua Bloch, who also hacked low level C code for many years before becoming a big-time Java head, told me that he didn’t get into object-oriented programming until quite late: “Java was the first object-oriented language I used with any seriousness, in part because I couldn’t exactly bring myself to use C++.” He echoed Zawinski’s point about how C++ forces programmers to subset the language:

I think C++ was pushed well beyond its complexity threshold and yet there are a lot of people programming it. But what you do is you force people to subset it. So almost every shop that I know of that uses C++ says, “Yes, we’re using C++ but we’re not doing multiple-implementation inheritance and we’re not using operator overloading.” There are just a bunch of features that you’re not going to use because the complexity of the resulting code is too high. And I don’t think it’s good when you have to start doing that. You lose this programmer portability where everyone can read everyone else’s code, which I think is such a good thing.

Ken Thompson, who still mostly uses C despite working at Google which is largely a C++ shop, has had as long an exposure to C++ as just about anyone, having worked with with Bjarne Stroustrup, C++’s inventor, at Bell Labs:

I would try out the language as it was being developed and make comments on it. It was part of the work atmosphere there. And you’d write something and then the next day it wouldn’t work because the language changed. It was very unstable for a very long period of time. At some point I said, no, no more.

In an interview I said exactly that, that I didn’t use it just because it wouldn’t stay still for two days in a row. When Stroustrup read the interview he came screaming into my room about how I was undermining him and what I said mattered and I said it was a bad language. I never said it was a bad language. On and on and on. Since then I kind of avoid that kind of stuff.

At that point in the interview I almost changed the topic. Luckily I took one more try at asking for his actual opinion of C++. His reply:

It certainly has its good points. But by and large I think it’s a bad language. It does a lot of things half well and it’s just a garbage heap of ideas that are mutually exclusive. Everybody I know, whether it’s personal or corporate, selects a subset and these subsets are different. So it’s not a good language to transport an algorithm—to say, “I wrote it; here, take it.” It’s way too big, way too complex. And it’s obviously built by a committee.

Stroustrup campaigned for years and years and years, way beyond any sort of technical contributions he made to the language, to get it adopted and used. And he sort of ran all the standards committees with a whip and a chair. And he said “no” to no one. He put every feature in that language that ever existed. It wasn’t cleanly designed—it was just the union of everything that came along. And I think it suffered drastically from that.

Brendan Eich, the CTO of the Mozilla Corporation, whose Mozilla browser is written almost entirely in C++, talks about “toe loss due to C and C++’s foot guns” and when I asked him if there are any parts of programming that he doesn’t enjoy as much as he used to, he replied:

I don’t know. C++. We’re able to use most of its features—there are too many of them. It’s probably got a better type system than Java. But we’re still screwing around with ’70s debuggers and linkers, and it’s stupid. I don’t know why we put up with it.

At least among my interviewees, even the most positive comments about C++ tended to fall in the category of “damning with faint praise”. I asked Brad Fitzpatrick, who used C++ in college and again now that he’s at Google, whether he likes it:

I don’t mind it. The syntax is terrible and totally inconsistent and the error messages, at least from GCC, are ridiculous. You can get 40 pages of error spew because you forgot some semicolon. But—like anything else—you quickly memorize all the patterns. You don’t even read the words; you just see the structure and think, “Oh, yeah, I probably forgot to close the namespace in a header file.” I think the new C++ spec, even though it adds so much complexity, has a lot of stuff that’ll make it less painful to type—as far as number of keystrokes. The auto variables and the for loops. It’s more like Python style. And the lambdas. It’s enough that I could delude myself into thinking I’m writing in Python, even though it’s C++.

Dan Ingalls, who helped invent modern object oriented programming as part of Alan Kay’s team that developed Smalltalk, never found C++ compelling enough to use but isn’t totally adverse to using it:

I didn’t get that much into it. It seemed like a step forward in various ways from C, but it seemed to be not yet what the promise was, which we were already experiencing. If I had been forced to do another bottom-up implementation, instead of using machine code I would’ve maybe started with C++. And I know a couple of people who are masters of C++ and I love to see how they do things because I think they don’t rely on it for the stuff that it’s not really that good at but totally use it as almost a metaprogramming language.

Joe Armstrong, similarly, has never felt the need to learn C++:

No, C++, I can hardly read or write it. I don’t like C++; it doesn’t feel right. It’s just complicated. I like small simple languages. It didn’t feel small and simple.

And finally Guy Steele, who probably knows more about more languages than anyone I interviewed (or possibly anyone, period), has also not been drawn to C++. But he did go out of his way to try to say something nice about Stroustrup’s effort:

I have not been attracted to C++. I have written some C++ code. Anything I think I might want to write in C++ now could be done about as well and more easily in Java. Unless efficiency were the primary concern.

But I don’t want to be seen as a detractor of Bjarne Stroustrup’s effort. He set himself up a particular goal, which was to make an object-oriented language that would be fully backwards-compatible with C. That was a difficult task to set himself. And given that constraint, I think he came up with an admirable design and it has held up well. But given the kinds of goals that I have in programming, I think the decision to be backwards-compatible with C is a fatal flaw. It’s just a set of difficulties that can’t be overcome.

Obviously with only fifteen interviewees in my book I have only a sampling of possible opinions. There are great programmers who have done great work with C++ and presumably at least some of them would have had more enthusiastic things to say about it if I had spoken with them. But this is what I heard from the people I spoke with.


1. I think I once managed to read all the way through Stroustrup’s The C++ Programming Language and have looked at at least parts of The Design and Evolution of C++. But I have never done any serious programming in it. I have made a couple attempts to learn it just because I felt I should but in recent years I’ve mostly given up, thinking that perhaps Erik Naggum, scourge of Usenet, was right when he said: “life is too long to know C++ well.”

Advertisement

Wherein I attempt to establish my testing cred

October 8, 2009

After my recent posting about what some of the folks I interviewed for Coders at Work had to say about unit testing and TDD, I’ve seen comments various places from people who seem to think that I must never have really tried TDD or that I think unit testing is not useful. Neither of which is really true.

In the last two jobs I had before I quit full-time work to write books and do a bit of consulting I was, among other things, the guy who wrote the test framework used for testing all our Java code. My first framework, which I wrote in the early days of Weblogic, predated JUnit but was similar in intent—my goal was to make it as easy as possible for developers to write fine grained tests and to pinpoint as precisely as possible the cause of test failures so that we could run tests in a continuous build-and-test system that tested every check-in and reported the results. (”Build and test monkeys” we called them at Weblogic and a stuffed monkey made the rounds of the developers’ desks, sitting on the desk of whoever had last broken something.)

In addition to working on the framework itself, I spent a lot of my time at Weblogic trying to figure out how we could have finer-grained tests that would run more quickly and provide easier to track down failures than the rather coarse-grained system tests we had a lot of. (At one point, when I was working on our EJB implementation, I came up with a trick that I’m still a bit proud of: to test our implementation of the state machine implementing the EJB life-cycle, we wrote an EJB that collected a list of tokens for each of the methods called on it by the EJB container and the test, after putting the EJB through its paces, fed that sequence of tokens to a parser generated from the grammar of legal paths through the state machine.)

At my next job, at another start-up founded by one of the Weblogic founders, I was hired early on not only to work on the server part of our product but to be in charge of our software development process. We set up another battery of build-and-test monkeys, using a test framework I had written after leaving Weblogic. And having experienced the benefits of pair programming and lots of testing at Weblogic, I tried to push our process toward something like XP though I don’t think we ever did enough of the practices to really count as an XP shop. Now, with six years of hindsight, I’m still not sure whether I’m glad or sad about that.

In my role as a developer at Kenamea, one of the biggest projects I worked on was implementing a transactional object store. Other than some proof of concept code I wrote by myself, that part of the system was almost entirely pair programmed and was possibly one of the most extensively unit tested parts of our product. I don’t actually recall whether we ever did test-first programming on it but my pair and I tried to be quite strict about writing unit tests for all the code we wrote. And the tests were definitely valuable. They gave us—as unit testing advocates always promise—confidence to dramatically refactor things when necessary and they told us when we had slipped up and broken something. On the other hand, the system was multithreaded which, as Joshua Bloch points out in Coders, always makes things much harder to test. Often the tests were sufficient to show the presence of a bug without pinpointing its location; hours or days of hard thinking were often required to track down those concurrency bugs and when we found them it was usually not at all clear that there were any unit tests we could have written that would have uncovered them.

When I quit Kenamea in order to spend some time hacking Lisp, I was still interested in unit testing and TDD. (There’s a reason that one of the first “practical” chapters in Practical Common Lisp is a simple test framework.) Shortly before I started work on PCL I read Kent Beck’s book Test Driven Development and did a few experiments with “pure” TDD which I recorded in a coding diary. The first was an implementation of an algorithm for generating starting positions for Fischer Random Chess, a version of chess where the pieces in the back row are positioned randomly, subject to a few constraints. Here’s the contemporaneous record of my attempt, complete with false starts and other brainos. (The code is in Common Lisp. Lispers may be shocked to see how little Lisp I knew then, a few months before I started work on PCL. And no guarantees that this code exemplifies good Lisp style or good anything else.)

The next problem I tackled with TDD was The “Impossible Book” problem, posed on the Test-driven Development Yahoo! group around that time. Again, I recorded my attempt as I went. I’m not sure these are great examples of TDD in action; I mention them merely as evidence that I have actually spent some time trying out TDD.

My Impossible Book attempt is also, perhaps, relevant to the current discussion since I was up against the same kind of difficultly as Ron Jeffries was while trying to write his Sudoku solver. Namely, I had no real idea how to solve the problem.

However—perhaps because there was no other infrastructure code to distract myself with—I was lucky enough to realize that no amount of testing was going to help me solve a problem I didn’t know how to solve. So instead I decided, as I say in my coding diary, that “TDD is about driving the implementation”. So I went off and read a bit about how to solve the problem and then used TDD to come up with an implementation. In this case I was lucky that there was already a readily accessible write up of a way to solve the problem. If there hadn’t been I would have had to do more research and probably would have had to learn some math in order to figure out how to apply it to the problem at hand. In other words, the thing that was going to speed me up was not trying harder with TDD but recognizing that I needed to step back and try something else.

Since those experiments with TDD, I’ve mostly been writing books so I haven’t been doing much work on production code. Working on my own coding projects, which tend to be exploratory, not very large, and special purpose (i.e. they only have to work for me) I have not found myself inclined to use TDD or even a lot of unit testing. I’m not claiming that that’s due to a principled appraisal of the benefits and drawbacks of TDD or unit tests; it just hasn’t felt like the problems I’ve had writing the software I’ve wanted to write would be fixed by having more unit tests nor that writing test-first would speed up my exploration.

And that, I guess, brings me back to how I was drawn into this conversation in the first place: I think testing, of all kinds, is an important part of serious software development. I think TDD is, at the very least, an interesting way to write software and I suspect that it might be a very good way to write some kinds of software. But I think the claim that TDD always speeds you up is just bunk. It may be that for the kinds of software Uncle Bob and Tim Bray write, in the kinds of organizations where they work, over the kinds of time scales they care about, it really does always speed things up. I’m happy to believe Uncle Bob when he says that he’s seen the benefits of TDD in his own and in others’ work.

But I also think that when Jamie Zawinski says that writing unit tests would have slowed down the first release of Netscape or when Donald Knuth says that he thinks he saved time by writing out TeX in a notebook without any testing at all until he had figured out how the whole program was going to work, those are data points that need to be accounted for, not dismissed with insults about “living in the stone age” and “being willfully ignorant”. Maybe Zawinski and Knuth are wrong about their own experience. Or maybe they were making different trade offs than Uncle Bob and Bray would chose to make. At any rate, I agree with Tim Bray when he says

If you read the comments around this debate, it’s increasingly obvious that we as a profession don’t have consensus around the value of TDD. Many of our loudmouths (like me for example) have become evangelists probably to the point of being obnoxious. But there remains a strong developer faction out there, mostly just muttering in their beards, who think TDD is another flavor of architecture astronautics that’s gonna slow them down and get in their way.

Maybe TDD’s detractors are, as Uncle Bob claims, analogous to the 19th century surgeons poo-pooing the benefits of washing their hands before surgery but I find Uncle Bob’s rhetorical stance of absolute certainty disconcerting and, ironically, anti-persuasive. That is, I thought better of TDD before I read his recent postings about it. But that’s a silly reason to accept or reject a practice that might do me some good. If I go back to writing production code, I’ll certainly resume my own contemplation on the best way to mix testing with development and wouldn’t be surprised if TDD found a place in my own practice.

Unit testing in Coders at Work

October 5, 2009

In his now infamous blog post “The Duct Tape Programmer”, Joel Spolsky quoted Jamie Zawinski from my interview with him in Coders at Work talking about how they didn’t use many unit tests when developing Netscape. “Uncle Bob” Martin, chiming in to say that Spolsky posting was right in general but wrong in almost all his specific claims and criticisms, was particularly riled by Spolsky’s implication that maybe unit tests aren’t 100% necessary at all times:

As for Joel’s consistent dismissal of unit testing, he’s just wrong about that. Unit testing (done TDD style) does not slow you down, it speeds you up. One day I hope Joel eventually realizes this. Programmers who say they don’t have time to write tests are living in the stone age. They might as well be saying that man wasn’t meant to fly.

Tim Bray also jumped in to strongly agree with Uncle Bob on the importance of unit tests, though he couldn’t bring himself to actually agree with much else Uncle Bob said.

Joel is wrong to piss on unit testing, and buys into the common fantasy that it slows you down. It doesn’t slow you down, it speeds you up. It’s been a while since I’ve run a dev team, but it could happen again. If it does, the developers will use TDD or they’ll be looking for another job.

Since this all started from the Zawinski interview in Coders at Work and since there were other people interviewed for the book, I figured it might be interesting to see what some of the other folks I talked to had to say about unit testing and things like TDD (“test driven development” or sometimes “test driven design”, for those of you behind on your acronyms.)

To start with, here’s a bit more context from the Zawinski interview:

Seibel: What about developer-level tests like unit tests?

Zawinski: Nah. We never did any of that. I did occasionally for some things. The date parser for mail headers had a gigantic set of test cases. Back then, at least, no one really paid a whole lot of attention to the standards. So you got all kinds of crap in the headers. And whatever you’re throwing at us, people are going to be annoyed if their mail sorts wrong. So I collected a whole bunch of examples online and just made stuff up and had this giant list of crappily formatted dates and the number I thought that should turn into. And every time I’d change the code I’d run through the tests and some of them would flip. Well, do I agree with that or not?

Seibel: Did that kind of thing get folded into any kind of automated testing?

Zawinski: No, when I was writing unit tests like that for my code they would basically only run when I ran them. We did a little bit of that later with Grendel, the Java rewrite, because it was just so much easier to write a unit test when you write a new class.

Seibel: In retrospect, do you think you suffered at all because of that? Would development have been easier or faster if you guys had been more disciplined about testing?

Zawinski: I don’t think so. I think it would have just slowed us down. There’s a lot to be said for just getting it right the first time. In the early days we were so focused on speed. We had to ship the thing even if it wasn’t perfect. We can ship it later and it would be higher quality but someone else might have eaten our lunch by then.

There’s bound to be stuff where this would have gone faster if we’d had unit tests or smaller modules or whatever. That all sounds great in principle. Given a leisurely development pace, that’s certainly the way to go. But when you’re looking at, “We’ve got to go from zero to done in six weeks,” well, I can’t do that unless I cut something out. And what I’m going to cut out is the stuff that’s not absolutely critical. And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that. That’s an upstream issue.

I hope I don’t sound like I’m saying, “Testing is for chumps.” It’s not. It’s a matter of priorities. Are you trying to write good software or are you trying to be done by next week? You can’t do both. One of the jokes we made at Netscape a lot was, “We’re absolutely 100 percent committed to quality. We’re going to ship the highest-quality product we can on March 31st.”

So Zawinski says unit testing would have slowed them down. Uncle Bob and Tim Bray both say that unit testing, “doesn’t slow you down, it speeds you up.” Did Zawinski and the rest of the Netscape gang just blow it? They were going all out to develop their software as fast as they could; could they have sped things up with more unit testing? Maybe they were just living in the stone age.

Now, if Uncle Bob and Bray wanted to make a less radical claim than that unit testing always speeds you up, they could point out that unit tests can help you go faster over the longer term, and it’s not clear even Zawinski would disagree. And they’d definitely get some strong support for that claim from the subject of chapter two of Coders, Brad Fitzpatrick. I asked him about any big differences between his early and later programming style:

Fitzpatrick: I’ve also done a lot of testing since LiveJournal. Once I started working with other people especially. And once I realized that code I write never fucking goes away and I’m going to be a maintainer for life. I get comments about blog posts that are almost 10 years old. “Hey, I found this code. I found a bug,” and I’m suddenly maintaining code.

I now maintain so much code, and there’s other people working with it, if there’s anything halfway clever at all, I just assume that somebody else is going to not understand some invariants I have. So basically anytime I do something clever, I make sure I have a test in there to break really loudly and to tell them that they messed up. I had to force a lot of people to write tests, mostly people who were working for me. I would write tests to guard against my own code breaking, and then once they wrote code, I was like, “Are you even sure that works? Write a test. Prove it to me.” At a certain point, people realize, “Holy crap, it does pay off,” especially maintenance costs later.

Another interviewee, Joshua Bloch described how he designs code by starting with the APIs. He claimed this is a sort of “test-first programming and refactoring applied to APIs” since the first thing he does with a newly designed is test whether it would support the use cases that had lead to creating the API in the first place. But since he does all that writing any runnable code, that could also be called old-fashioned, “thinking about what you’re going to do before you do it” programming. Bloch did dispute the claim of those TDD advocates who say the tests produced by TDD can function as a spec for the code under test:

I don’t think tests are even remotely an acceptable substitute for documentation. Once you’re trying to write something that other people can code to, you need precise specs, and the tests should test that the code conforms to those specs.

Elsewhere Bloch described how he used both system and unit testing when he was working on an implementation of transactional shared-memory:

To test the code, I wrote a monstrous “basher.” It ran lots of transactions, each of which contained nested transactions, recursively up to some maximum nesting depth. Each of the nested transactions would lock and read several elements of a shared array in ascending order and add something to each element, preserving the invariant that the sum of all the elements in the array was zero. Each subtransaction was either committed or aborted—90 percent commits, 10 percent aborts, or whatever. Multiple threads ran these transactions concurrently and beat on the array for a prolonged period. Since it was a shared-memory facility that I was testing, I ran multiple multithreaded bashers concurrently, each in its own process.

At reasonable concurrency levels, the basher passed with flying colors. But when I really cranked up the concurrency, I found that occasionally, just occasionally, the basher would fail its consistency check. I had no idea what was going on. Of course I assumed it was my fault because I had written all of this new code.

After the system test demonstrated the presence of a bug he turned to unit tests to find it:

I spent a week or so writing painfully thorough unit tests of each component, and all the tests passed. Then I wrote detailed consistency checks for each internal data structure, so I could call the consistency checks after every mutation until a test failed. Finally I caught a low-level consistency check failing—not repeatably, but in a way that allowed me to analyze what was going on. And I came to the inescapable conclusion that my locks weren’t working. I had concurrent read-modify-write sequences taking place in which two transactions locked, read, and wrote the same value and the last write was clobbering the first.

I had written my own lock manager, so of course I suspected it. But the lock manager was passing its unit tests with flying colors. In the end, I determined that what was broken wasn’t the lock manager, but the underlying mutex implementation! This was before the days when operating systems supported threads, so we had to write our own threading package. It turned out that the engineer responsible for the mutex code had accidentally exchanged the labels on the lock and try-lock routines in the assembly code for our Solaris threading implementation. So every time you thought you were calling lock, you were actually calling try-lock, and vice versa. Which means that when there was actual contention—rare in those days—the second thread just sailed into the critical section as if the first thread didn’t have the lock. The funny thing was that that this meant the whole company had been running without mutexes for a couple weeks, and nobody noticed.

I asked him if he though the author of the mutex code that had been the cause of his problems could or even should have caught the bug with his own unit tests:

I think a good automated unit test of the mutex facility could have saved me from this particular agony, but keep in mind that this was in the early ’90s. It never even occurred to me to blame the engineer involved for not writing good enough unit tests. Even today, writing unit tests for concurrency utilities is an art form.

Donald Knuth, who is also a fan of after-the-fact torture tests, described an approach to coding about as far away from TDD as you can imagine, which he used when originally developing his typesetting system, TeX:

Knuth: When I wrote TeX originally in 1977 and ’78, of course I didn’t have literate programming but I did have structured programming. I wrote it in a big notebook in longhand, in pencil.

Six months later, after I had gone through the whole project, I started typing into the computer. And did the debugging in March of ’78 while I had started writing the program in October of ’77. The code for that is in the Stanford archives—it’s all in pencil—and of course I would come back and change a subroutine as I learned what it should be.

This was a first-generation system, so lots of different architectures were possible and had to be discarded until I’d lived with it for a while and knew what was there. And it was a chicken-and-egg problem—you couldn’t typeset until you had fonts but then you couldn’t have fonts until you could typeset.

But structured programming gave me the idea of invariants and knowing how to make black boxes that I could understand. So I had the confidence that the code would work when I finally would debug it. I felt that I would be saving a lot of time if I waited six months before testing anything. I had enough confidence that the code was approximately right.

Seibel: And the time savings would be because you wouldn’t spend time building scaffolding and stubs to test incomplete code?

Knuth: Right.

So Knuth too disagrees with the notion that unit testing always makes you go faster. Maybe he too is living in the stone age.

Joe Armstrong, on the other hand, says he has moved toward a test-first development style recently:

Seibel: At the point that you start typing code, do you code top-down or bottom-up or middle-out?

Armstrong: Bottom up. I write a little bit and test it, write a little bit and test it. I’ve gone over to this writing test cases first, now. Unit testing. Just write the test cases and then write the code. I feel fairly confident that it works.

The only interviewee who touched directly on TDD versus other approaches was Peter Norvig. He said he does more unit testing than he used to and even said some nice things about TDD but pointed out:

It’s also important to know what you’re doing. When I wrote my Sudoku solver, some bloggers commented on that. They said, “Look at the contrast—here’s Norvig’s Sudoku thing and then there’s this other guy,” whose name I’ve forgotten, one of these test-driven design gurus. He starts off and he says, “Well, I’m going to do Sudoku and I’m going to have this class and first thing I’m going to do is write a bunch of tests.” But then he never got anywhere. He had five different blog posts and in each one he wrote a little bit more and wrote lots of tests but he never got anything working because he didn’t know how to solve the problem.

A bit later Norvig said:

Then bloggers were arguing back and forth about what this means. I don’t think it means much of anything—I think test-driven design is great. I do that a lot more than I used to do. But you can test all you want and if you don’t know how to approach the problem, you’re not going to get a solution.

Ignoring Norvig’s suggestion that the difference between the two attempts doesn’t mean much of anything, it is instructive (or at least interesting, in a rubber-necking kind of way) to look at the two writeups. The “other guy” turns out to be Ron Jeffries, one of the inventors of Extreme Programming, the author of two books on XP, and according to his website an “experienced XP author, trainer, coach, and practitioner”.

Norvig’s writeup is a short essay explaining about 100 lines of Python that can solve any Sudoku. Jeffries writeup, by contrast, is spread over five lengthy blog postings here, here, here, here, and here and ends without coming anywhere close to actually producing a program that can solve any but a tiny subset of all Sudoku problems.

At some level the difference between the two simply boils down—as Norvig suggests—to knowledge: Norvig knew how to solve the problem because it’s a specific instance of a kind of problem he already knew how to solve. Jeffries, obviously, did not. But he did choose to tackle this particular problem using TDD, a technique in which he is supposed to be the expert? Why did he have so little success?

One thing I noticed, reading through Jeffries’s blog posts, was that he got fixated on the problem of how to represent a Sudoku board. He immediately started writing tests of the low-level details of a few functions for manipulating a data structure representing the 9×9 Sudoku board and a few functions for getting at the rows, columns, and boxes of the board. (“Boxes” are what Sudoku players call the 3×3 squares subsquares of the 9×9 board.)

Then he basically wandered around for the rest of his five blog postings fiddling with the representation, making it more “object oriented” and then fixing up the tests to work with the new representation and so on until eventually, it seems, he just got bored and gave up, having made only one minor stab at the problem of actually solving puzzles.

I suspect, having done a small amount of TDD myself, that this is actually a pattern that arises when a programmer tries to apply TDD to a problem they just don’t know how to solve. If I was a high-priced consultant/trainer like Jeffries, I’d probably give this pattern a pithy name like “Going in Circles Means You Don’t Know What You’re Doing”. Because he had no idea how to tackle the real problem, the only kinds of tests he could think of were either the very high-level “the program works” kind which were obviously too much of a leap or low-level tests of nitty-gritty code that is necessary but not at all sufficient for a working solver.

However, since most of what Jeffries spent his time on was the code for representing a Sudoku board and determining which row, column, and box a given square on the board is in, let’s look at that part of Norvig’s code.

Norvig’s basic strategy is to represent a board using a hash table with the keys being row-column pairs like A1, A2, and so on up to I9. After seeing the mess Jeffries makes of trying represent a board this is a refreshingly simple choice. It also seems to me that it requires a bit of creativity: given that a Sudoku board is a 9×9 board, I suspect I’m not the only programmer in the world who might be inclined to start with a 2d array. In a language without true 2d arrays, I might then be tempted, as Jeffries was, to then use an 81-element array and then get all wrapped around the axle, as Jeffries did, making sure I haven’t screwed up the finicky math for converting between 1d and 2d indices. And for all we know Norvig fell into the same trap and only later realized that all he really needed was the easy random access provided by a hash table. But pretty clearly Jeffries’s approach of testing the heck out of an array-based implementation wasn’t sufficient to lead him to the much better hash table-based one.

Given his choice to use a hash table, Norvig needs a list of the keys, i.e. the Cartesian product of the row labels (A-I) and the column labels (1-9). So the first bit of code he shows is a function cross which implements a Cartesian product that combines the pairs of elements by concatenation. The standard mathematical definition of the Cartesian product is:

A × B = {(a,b) | a ∈ A and b ∈ B }
        

Python’s list comprehensions let Norvig express this function in essentially the same notation as the standard mathematical definition:

def cross(A, B):
    return [a+b for a in A for b in B]
        

Could he or should he have developed this function via a test-first strategy? Dunno. Would it have been faster? Possibly, if he had any problems getting it right. But given that he’s just transcribing a mathematical definition, maybe a quick check at Python’s REPL would be sufficient to make sure he hadn’t screwed anything up.

Next he defines two variables to hold the row and column labels:

rows = 'ABCDEFGHI'
cols = '123456789'
        

Do TDD people unit test their data? I don’t know. Should they? I’m not even sure what that would mean, at least in a case like this. At any rate, he then feeds these two untested values to the cross function to produce the list of all 81 squares:

squares  = cross(rows, cols)
        

Now he’s basically done with the representation of the board. Any hash table, with the elements of squares as its keys represents a Sudoku board. Later in his code Norvig will use a hash table with strings containing the possible digits that could be put in each square as values.

However there is one other bit of work to be done: to solve a Sudoku you’re going to need to be able to map from a square to the other squares in the same row, column, and box. Having read a bit about Sudoku solving, Norvig has discovered that people use the term ‘units’ to refer to the rows, columns, and boxes and ‘peers’ to refer to all the squares that are in one of the three ‘units’ of another square. Norvig, unlike Jeffries, realized that this is better represented in data than in code and proceeds to compute the data once and for all.

First he makes a list of all the units, i.e. all rows, columns, and boxes, using list comprehensions and his cross function:

unitlist = ([cross(rows, c) for c in cols] +
            [cross(r, cols) for r in rows] +
            [cross(rs, cs) for rs in ('ABC','DEF','GHI') for cs in ('123','456','789')])
        

Now, he can use unitlist to generate a dictionary, units, that maps each square name to a list of its three units. He completely brute-forces this, linearly scanning unitlist for each square, selecting the units containing the square which itself requires a linear scan of each element of unitlist. But why write something more clever when unitlist is only 27 elements long and it’s elements are each 9 elements long and this whole computation is only going to happen once anyway? Here’s the code:

units = dict((s, [u for u in unitlist if s in u]) 
             for s in squares)
        

Once he’s got units, which will also be used in its own right later, he can compute peers, a hash table that maps from square names to the set of peer squares:

peers = dict((s, set(s2 for u in units[s] for s2 in u if s2 != s))
             for s in squares)
        

And that’s it: 7 definitions in 12 lines of code and he’s done with data representation. I’m not sure how much code Jeffries ended up with. In his fourth installment he had about 81 lines devoted to providing slightly less functionality than Norvig provided in the code we just looked at. In the fifth (and mercifully final) installment, he started adding classes and subclasses and moving things around but never presented all the code again. Safe to say it ended up quite a lot more than 12 lines; if he’s lucky it stayed under 120.

I’m not a proponent (or particularly a detractor) of TDD. If I was, I’d be pretty strongly tempted to throw Jeffries under the bus—maybe TDD isn’t quite as bad as he makes it look in this exercise. It certainly seems that, within the constraints of TDD, he could have done a much better job. Perhaps if he had stopped to think a bit about what he was doing he could have, using TDD, ended up with code as simple as Norvig’s. For instance, if he had started a bit closer to the problem domain he could have started by writing tests of his ability to map from squares to peers and units and then implemented something to provide that functionality. So maybe Norvig is right—maybe there’s not much to learn from this episode except that Fred Brooks is still right and there are still no silver bullets.

Anyway, those are some of the highlights of, and some of the context around, what the folks I interviewed for Coders had to say about testing. There are probably some other good bits I’m forgetting at the moment. Feel free to buy a copy and look for them yourself.