Friday, October 24, 2008

Note to Self: Software Developers are not the Center of the Universe

Those who have read my previous blog entries will have noticed by now that I like to take a somewhat contrary tone in my writing. Of course, in my everyday career life, I find plenty of useful, if uninteresting things to think about in the software world. New languages such as Flex are making so much progress that I get tired just trying to keep up and of course, there remain the day-to-day minutia that really aren't that interesting to write about and would be even less interesting to read about. For those interested in keeping up with the latest technological happenings in the development world, I would recommend Dustin Marx's blog. He has a passion for what he writes and the energy of a man ten years my junior when it comes to documenting his findings. For my own part, I try to document the little epiphanies that I occasionally stumble across during my development activities. Recently, I had another one.

I was reminded of something so common that it seems to be often overlooked by myself and the other developers that I've met: we are not the center of the universe. Putting aside the, "No Duh," obviousness of that statement for a moment, allow me to point to a couple of examples of where this principle seems to be forgotten.
I was at a conference a few months ago listening to an expert panel argue (again) about the merits of static versus dynamic languages. One of the experts was railing against Java as being "high-ceremony" and generally too verbose. He complained about the number of exceptions that need to be checked when performing file I/O. He argued that many catch blocks are usually left blank and that the language shouldn't force developers to write so much code to handle these things. He finished the argument with an almost off-handed comment, "… especially since all of those exceptions pretty much never occur anyway," or something to that effect.

At that point, another expert on the same panel responded, "As someone who's worked in an operations center when a hard disk was failing, I can tell you that those exceptions do occur and they should be treated seriously by the developer, even if they are rare." After pausing for several months to digest what I'd just witnessed, the blinding flash of obviousness finally hit me: the day-to-day events of the software development world are not representative of the day-to-day events of the operational world, the "real" world that our software operates in. Software developers are not the center of the universe and our typical experiences are not representative of anyone else.

In the above example, if my hard drive started to fail I'd be thinking about the last time I backed up my data and checked in my code. I almost certainly wouldn't be thinking about how gracefully my software was handling the problem. In fact, I probably wouldn't even be running my software when I noticed things starting to go wrong. ("Quick, now that I'm getting intermittent blue screens of death, let's fire up the web server and see if the sessions fail gracefully.")

I don't think that the importance of that one-in-a-million hard disk failure had ever occurred to the first expert when he was complaining about having to catch too many kinds of exceptions. (And no, you should never leave them blank.) But as developers, we often spend so much of our time focused on the act of software creation that we rarely experience (or think about), the everyday life of the users. I'm not talking about software usability (though that sometimes shows up as a symptom). What I'm pointing out is that we regularly choose languages, libraries, tools, and practices that make our life easier without really considering what happens after we deliver the product and go away.

Here's another example. I've met many open-source proponents who honestly don't understand why any company would purchase a half-million dollar application server, database, ESB, or portal server when there are plenty of open-source offerings available, "for free." Answer: Only a small portion of the total cost of operating any of these products is in the actual development. The majority of the costs come from keeping our software running day after day, year after year. In that time frame, when (not if) a problem occurs, the $500,000 in licensing fees is a drop in the bucket next to the potential losses if the enterprise fails catastrophically. Companies will spend that money up front if it will help to keep those clusters running smoothly.

Most developers that I've met don't think about their software that way. (That included me until recently.) Instead, they see the headaches with installing, configuring, and learning a new "proprietary" tool. They look at the time it takes to get a server "out of the box" and running and feel that it takes way too long and involves too many steps. They tend to worry more about the turnaround time when hot deploying their latest compilation and less about what it would take to roll back a new version of their software without taking an entire cluster down if something goes wrong.

All of these issues are valid too…for a developer. An operations floor doesn't care if a hot deploy takes seconds or tens of minutes. They don't deploy that often. They do however, care about not having to restart their enterprise if a piece of hardware fails for some reason.

I suppose it's fair to consider the possibility that an operations center isn't center of the universe either. However, all of this really is meant to server as a reminder (to me if no one else) that while we are paid to solve problems, they're our customer's problems. Our problems are often secondary in the grand scheme of things.