2021-10-04

Anecdote About Legacy Code

I read a piece where someone argued that code should be rewritten every five years. I can't be bothered to find it again to link to it, but suffice to say I disagree. There's not a single system I've worked with that could feasibly be rewritten that often; many have even taken more than five years to implement in the first place.

So you have to deal with a codebase that's been around since before you joined the team? And nobody in the team were there in its inception? So what? They should have told you in computer science class that that's what the industry looks like.

And they should have told you how satisfying it is to work with an old platform and fix the problems that arise, instead of constantly dealing with the same issues and making the same mistakes over and over again.

But Onto the Anecdote

I used to work at a company that built tools to handle CI in clearcase environments. Clearcase is a centralised version control system built by IBM. It's different from git or subversion or probably any other you've worked with. It has its strengths and use cases, but it doesn't always scale well.

A clearcase repository is called a vob (I don't remember why), and IBM had a recommended maximum size of vobs. When they saw that the company I worked at handled vobs a lot larger than that they actually increased the recommendation, but we still overshot that by a magnitude.

At one point my team was working on speeding up a certain step in the CI process, and I started investigating. The script in question was written in perl and spent most its 12 minutes of runtime querying clearcase about various data. This was more than a decade ago now, but I was quite adept at clearcase and knew what types of queries usually took a long time in our huge vobs. And here's the thing about legacy code: you need to take your time to understand it. It's an investigatory and sometimes almost archaeological process. I like it.

So I skimmed through the couple of thousand lines, seeing what the function names were and where they were called. Then I ran it through the perl debugger to see the actual flow with real data.

Pretty soon I encountered a block of code that looked something like this:


if (verifykClearcaseConditions()) {
    ...
}

Which conditions did that function verify? I looked at the function. It was huge. A large portion of the entire script was dedicated to it, and I could tell that it had grown over time with new if and else statements as more conditions needed to be accounted for.

It queried clearcase a lot, and those were time-consuming queries. One by one, as if ticking off a list.

I started listing the different conditions on a piece of paper, and looked at them both one by one and in conjunction. What would it mean if you combined A and B? What would it mean if you added C?.

It turns out the function would always, with no exceptions, return True.

Needless to say I cut it out, and the script was suddenly 11 minutes faster.

Conclusion

I guess most programmers would find this tedious. And I believe that's why the script looked like it did. Every time a new requirement was introduced a programmer had just added it, because they didn't want to do the work required to understand their job.

Don't do that. A part of your job is to understand the requirements and codebases you work with. This isn't just about the code either: if you properly understand the requirements you'll often end up writing less code, and maybe removing a bunch of code. In my experience a lot of programmers don't want that. They just like to code.

-- CC0 Björn Wärmedal