“Size is the Enemy” is a Jeff Atwood (Coding Horror) article that has made the rounds lately. It’s a response to a rambling rant by Steve Yegge about how large code bases are unwieldy, and how language choice affects the viability of software projects beyond a certain size. Early on, Jeff calls out a quote that I, too, find really amusing:
I happen to hold a hard-won minority opinion about code bases. In particular I believe, quite staunchly I might add, that the worst thing that can happen to a code base is size.
I don’t think I’ve ever encountered a programmer who truly believed that “bigger is better” with regards to codebase size. Granted, the crowd I tend to run with isn’t exactly a bunch of Perl golfers either, but no reasonable person would doubt that complexity scales superlinearly with lines of code. So, at least with regards to the sentiment that codebase size should be minimized, I find myself in agreement with both — if you can reduce the size of your code while maintaining readability, then it’s an easy win. For example, on Neverwinter Nights 2, we drastically reduced the number of scripts required by adding a parameter passing mechanism to NWScript — I actually ran the numbers and discovered that 75% of the script calls in conversations used our new, parameterized scripts rather than the one-off style of scripts from NWN1. This is a great example of reducing code size without reducing functionality, and it made things much more manageable.
I take issue with the second conclusion of Yegge’s article, though. While it’s true that functional languages can often express algorithms more eloquently than imperative languages, I don’t think this magically translates into huge maintainability wins in a monster codebase. Sure, the trivial example of reading lines out of a file makes other languages look great in comparison to hoary old C++. (I’m not sure if I’m allowed to call Java “hoary old Java” yet — after all, it’s only 12 years old or so.) But to hold up this example as proof of language superiority is missing the bigger picture — what makes an application distinctive is not how it reads lines out of a text file, but rather all the other stuff it does that no other piece of software does. The design of that “stuff” probably has more to do with maintainability than language choice ever could.
The biggest factor in the complexity of a code base, in my opinion, is the complexity of its internal interfaces. I hate to trot out the quixotic concept of the “software IC”, but thinking about an interface in terms of building it onto a chip is a decent analogy in this case. Once you get beyond a certain number of “pins,” or command multiplexers, or what have you, things get complicated. When you have 500,000 lines of this kind of “complicated,” all interacting with each other in mysterious ways, you have big trouble.
Now, I’m not advocating going gonzo with componentization, either, in spite of how delicious ravioli code sounds. I’ve seen and heard about way too many over-engineered, CORBA-gone-wild projects to make any kind of blanket statements in support of component architectures and stuff like that. But by keeping interfaces (whether internal to the code, or part of a component system) minimalist in nature, and designing them so that doing the right thing is easy, and doing the wrong thing is hard, you can maintain a good level of understandability in a codebase. I see no reason why this advantage does not scale with project size. Dividing a codebase into easily testable, well-defined components with simple interfaces is key, particularly for lone-wolf developers.
I find it very interesting that the project in Yegge’s article (Wyvern) is not just a game, but a role-playing game. I know first-hand that RPG rules systems can, by their nature, necessitate a kind of code design that leads to massive complexity. RPGs tend to carry a massive amount of state around, and have rules systems that interact with that state in often arbitrary ways. (An example of this would be a character trait that changes the order of combat resolution, or some kind of “luck” trait that allows re-rolls of certain types of skill checks. If you have many rule-changers like this, it’s almost impossible to write a clean system to handle it.) This kind of complexity is design complexity, and has nothing to do with programming languages and the features they support. For this reason, I think that even if he succeeds in his goal of removing 33% of Wyvern’s lines of code, I don’t think that the resulting code base will be any easier to maintain. (His game is a 2D Java RPG — I don’t think that it’s a stretch to say that the majority of his code is going to be related to game rules and game content.)
I think that both authors are missing a key point — namely, that if you can simplify your application or problem, you should do so. Wyvern should look more to Magic: The Gathering’s rules than D&D. (Granted, Magic has its own mind-benders, but I think that it’s fundamentally simpler than even the newer versions of D&D. Coming from a math professor, that’s what I would expect.) Granted, sometimes it’s not possible to simplify a problem any further, but in this case Wyvern appears to be a self-inflicted wound. And, finally, I am still in agreement that code size reduction is a wonderful thing, but I am less enthused about the opinions rendered in the language wars…