Software Erosion: How to Detect and Stop It

Software erosion is the most expensive line item nobody budgets for. Not a single bug, not a spectacular outage — but a gradual decay that makes every new feature a little slower, every change a little riskier, and every estimate a little less reliable. In the end, software erosion costs more than any incident that ever made it into a post-mortem — precisely because it never makes it into one.

The tricky part: erosion isn’t a failure. It’s the physics of software. Every system that is in production and keeps evolving erodes — the only question is how early you push back. This article shows how to recognise the decay before it gets expensive, what it actually costs you, and which strategies genuinely stop it — instead of the usual platitudes of “more refactoring, more documentation”.

What software erosion really is (and isn’t)

Software erosion — also called software decay or “architecture drift” — is not the decay of code sitting untouched on a shelf. Code that is never changed doesn’t erode; at most it ages technologically. Erosion happens, on the contrary, exactly where the most is going on: it’s the growing gap between the architecture you originally designed and the changes reality has forced on it since.

Every requirement “shoved in somehow” under deadline pressure, every shortcut never paid back, every dependency nobody updates — each on its own is harmless. In aggregate, the system drifts step by step away from the clear structure it started with. That’s the core of it: erosion is not one big mistake, but the sum of a thousand small, individually reasonable decisions.

That’s also why you can’t “test it away”. Your tests can be green, your software can work — and the architecture underneath can still be eroding. The damage shows up not in functionality, but in changeability.

How to spot erosion before it gets expensive

The most dangerous moment is the one where erosion is well underway but nobody names it. It hides in sentences spoken in almost every team. The following early warning signs are more reliable than any metric:

What you hear or observe	What’s underneath it
”The comparable feature went twice as fast last year.”	Declining development velocity — the first, hardest symptom.
”Nobody likes to touch module X.”	One area is so tangled that even experienced developers avoid it.
”The bug fix created three new bugs.”	Tight, invisible coupling — one change drags unpredictable consequences along.
”Onboarding takes us months.”	The knowledge lives in heads, not in structure and documentation.
”Clean-up has been in the backlog for two years.”	Technical debt that never gets paid — the interest keeps running.
”We don’t dare to update.”	Outdated dependencies nobody can upgrade safely anymore.
Estimates are systematically too low.	Complexity is no longer graspable — a classic erosion symptom.

If you recognise two or three of these sentences, it’s no reason to panic — but it is a clear signal to look now, instead of waiting until it shows up in the roadmap.

Sound familiar? A vague gut feeling becomes a prioritised roadmap in an architecture review — and the first step is a no-obligation first call.

Why erosion happens — the real causes

The causes are partly technical, partly organisational — and they reinforce one another.

Technical debt with compound interest. Every deliberate shortcut for speed is a loan. That’s legitimate — as long as it gets repaid. If it doesn’t, the interest grows: every further change builds on the shortcut, and at some point the debt is larger than the original code was ever worth.

Knowledge loss. Software lives on implicit knowledge — “why is this built this way?”. When a key person leaves without that knowledge being captured in structure, tests, or Architecture Decision Records, you’re left with code nobody understands anymore. Code that isn’t understood doesn’t get refactored — it gets worked around. And working around is erosion in its purest form.

Dependency drift and outdated runtimes. Frameworks, libraries, runtimes move on. Whoever doesn’t keep up collects not only security holes (an unsupported runtime means unpatched CVEs) but also cuts themselves off from modern tooling — until an update is no longer an update, but a migration project.

Requirement pressure without a refactoring budget. The most common cause is the most mundane: the business demands features, and the plan has no line item for structural care. Erosion then isn’t the result of bad developers, but of a process that systematically fails to plan for maintenance.

Missing architectural boundaries. Where no clear module and dependency boundaries are defined (and enforced), coupling grows on its own. This is exactly where it’s decided whether a system stays maintainable for years — a topic we cover in depth in modular monolith vs. microservices.

What erosion costs — the compound interest of neglect

The damage is real, even if it never shows up as a single line item. It spreads across four accounts:

Velocity. The most expensive account. What takes days early on takes weeks later. Your ability to ship — your actual competitive advantage — declines continuously.
Risk. Every change to an eroded system is a gamble. As coupling rises, so does the probability that a small tweak breaks something far away.
Security and compliance. Outdated runtimes and libraries are open flanks. What passes as “it still runs” is often a regulatory problem already.
People. Erosion is a quiet driver of attrition. Good developers don’t want to fight a system that slows them down every single day.

These four accounts aren’t theory. In a McKinsey survey, CIOs estimated that technical debt amounts to 20–40 % of the value of their entire technology estate — and that 10–20 % of the budget meant for new products is diverted into servicing that debt. Put differently: up to one in five euros of your innovation budget drains into maintaining code that was once supposed to ship faster. In everyday terms — if your team now needs twice as long for a change as it did two years ago, you’re effectively paying double for the same result.

This isn’t an abstract risk: a system in production already incurs 15–25 % of its initial build cost in maintenance every year — plus a modernisation surge of 20–40 % roughly every three to five years. Erosion shifts exactly this curve upward: neglect turns a “normal lifecycle” into an expensive overhaul.

And the line item grows the moment you ignore it: 60 % of the CIOs McKinsey surveyed saw their debt burden rise within three years. That’s the real trap of erosion — the cheapest time to act against it was a year ago; the second cheapest is today. Every postponed month turns a small refactoring into a larger one, and eventually an update into a migration project. That’s exactly the tipping point Omniga’s application landscape reached (more on that below), when an “outdated runtime” became a full modernisation programme.

Refactor, rewrite, or strangler-fig? The real decision

Once erosion has built up, you face the question almost every team gets wrong: clean up continuously, rebuild everything — or replace it piece by piece?

Approach	When it makes sense	Risk	Reality
Continuous refactoring	Debt is locally contained, the architecture is sound at its core	low	The normal case — maintenance during operation, built into the sprint.
Big-bang rewrite	Almost never	very high	Sounds tempting, usually fails: the old system keeps running, the new one swallows years, and erosion starts over in the rebuild.
Strangler-fig / incremental replacement	Architecture is at its limit, but operations can’t stop	medium	The pragmatic middle path: replace piece by piece without interrupting the business.

The honest answer is almost always: not the grand gesture, but the first clean cut. A big-bang rewrite trades a known, eroded system for an unknown, unfinished one — and doesn’t pay down the debt, it relocates it. The strangler-fig strategy — gradually replacing the old with the new while both run in parallel — is almost always superior in practice, because it breaks the risk into manageable stages.

Preventing erosion before it starts

The cheapest erosion is the one that never happens in the first place. Prevention costs a fraction of remediation — and consists of concrete practices, not good intentions:

Enforce architectural boundaries in CI. Module and dependency boundaries aren’t recorded in a document but as automated tests (architecture fitness functions) that turn a forbidden access into a build failure. What CI forbids doesn’t erode.
Architecture Decision Records (ADRs). Every important decision is captured with its why. This turns implicit head knowledge into explicit, verifiable knowledge — and it survives any staff change.
Refactoring as part of the Definition of Done. Clean-up isn’t a separate project you do “someday”, but part of every story. Debt is paid down where it arises.
Regular architecture reviews. A sober outside look before the drift sets in — preventive, not reactive.
Dependency updates as routine. Small, regular updates are harmless. The dreaded “big jump” only happens when you postpone the small ones for years.

The common denominator: erosion isn’t stopped by one-off heroics, but by discipline built into the normal flow of work.

Erosion in the age of AI — accelerated or curbed?

Since AI assistants produce code in minutes, the topic has taken on a new edge. The tools are a double-edged blade.

Without discipline, they accelerate erosion. Letting an LLM generate code unchecked gives you more code, faster — but not necessarily understood code. More volume with less comprehension is exactly the recipe for architecture drift, just in fast-forward.

Used well, they’re one of the best remedies against it. Precisely the work teams most like to postpone — backfilling tests, mechanical refactorings, working through dependency upgrades, explaining undocumented code — becomes far cheaper with AI. The craft hasn’t changed: it still lies in the architecture, in the boundaries, in the why. AI here is a tool, not the architect. Discipline decides which side of the blade you land on.

From practice: pushing back erosion at Omniga

What this looks like in reality is shown by our project with Omniga GmbH & Co. KG, a multi-brand IT company from Regensburg, Germany. Over the years, a heterogeneous application landscape had grown organically there on .NET Framework 4.6 — functionally stable, but with high technical debt, tight coupling, and a runtime past its support window. A textbook case of eroded software.

Instead of a risky big-bang rewrite, we migrated the entire landscape incrementally to .NET 8 — prioritised by criticality, with enforced architectural boundaries and modernised CI/CD pipelines. The result: substantially reduced technical debt, a unified platform for internal systems and customer-facing SaaS products — and zero downtime throughout the migration. The full story is in the .NET legacy modernisation for Omniga.

Conclusion

Software erosion isn’t a failure, but the normal state of every living system. You can’t abolish it — but you can control it. The difference between software that still holds up in five years and software that becomes an overhaul case lies not in the absence of erosion, but in early, disciplined push-back: visible architectural boundaries, debt paid down, dependencies kept current, a neutral outside look before the drift sets in.

So the question isn’t whether your software erodes — but how early you start pushing back.

Honestly assessing the state of an architecture is architect’s work — with us, a software architect does it, not a junior with a checklist. We name risks clearly, tell you just as plainly where there’s no need to act, and assess neutrally — without automatically pitching for the follow-up engagement. What comes out is a software architecture roadmap you could execute even without us. The only open question is how early you start — before erosion decides it for you.

Software Erosion: A Silent Threat to Long-Term Software Quality

What software erosion really is (and isn’t)

How to spot erosion before it gets expensive

Why erosion happens — the real causes

What erosion costs — the compound interest of neglect

Refactor, rewrite, or strangler-fig? The real decision

Preventing erosion before it starts

Erosion in the age of AI — accelerated or curbed?

From practice: pushing back erosion at Omniga

Conclusion

Related articles

Modular monolith vs. microservices: which architecture fits your project?

Storing secrets and keys securely with Azure Key Vault

Requirements vs. user stories: where the real difference lies

Is your software already eroding?