Employee Churn and Software Rewrites

When knowledge walks out the door, so does the reasoning for many decisions that led to the current architecture along with it.

We all know that rewriting something from scratch is rarely a good idea. The most common time for people to fall into the “big rewrite” trap is when the original authors are no longer around to explain and defend it. Writing code itself is far easier than reading it, so given the chance, often developers will opt to start from scratch. As systems grow, evolve and become increasingly complex, the ability to understand it as a whole goes down. So we take what we know of the system and believe that we can create a more straightforward representation of it by starting again. But in the same vein as Gall’s Law, any complex system has evolved from one simpler that worked, and over time, any rewrite is destined for the same fate if taken over by a new set of developers. It’s too much work to understand the existing system because it’s complex. We don’t understand the system, so we take nothing but assumptions into our rewrite.

So how can you protect yourself from this endless cycle of rewrites? The most straightforward but most challenging solution is to ensure that you don’t have any employee churn. All the authors of the system stay on the team and never leave. Simple in theory, difficult in practice. Another simple but complex solution is to try and architect your software into discrete modules, which are individually easy to understand and refactor, but that can then be composed together to form a more complex system; Think microservices. The trouble is that there is a lot we don’t know about a system when we start, so unless you’re staying on top of your tech debt and refactoring your code to match your evolving understanding of the problem, then you will end up with a “big ball of mud”. Clear and up-to-date documentation can help. Especially documentation that explicitly calls out designs (Design Docs) and individual decisions (ADRs). But who reads documentation anyways?

This isn’t to say that there aren’t times when a rewrite is a good idea. Not all systems built for the challenges of their time will necessarily address the challenges of today, so starting from scratch may be the optimal solution. And when the original authors of the system are present to acknowledge the shortcomings of a system and how a new architecture could address them, that can be a strong signal that it may not be a rewrite for rewrite’s sake.