Refactoring, Tests as Guardrails
Refactoring is production risk management
Refactoring means improving internal structure without changing external behavior. In production systems, refactoring is not optional: systems evolve, requirements change, and tech debt accumulates. But unsafe refactoring is a common cause of incidents.
Refactoring vs rewriting (do not confuse them)
- Refactoring: behavior stays the same, structure improves.
- Rewriting: behavior changes and must be rediscovered and revalidated.
Rewrites fail often because hidden behavior in the old system is lost. If the system works in production, assume it contains implicit requirements.
Production rule: prefer incremental refactor over big-bang rewrite
Small changes are easier to review, test, deploy, and roll back.
Tests as guardrails (what they actually do)
Tests are not only for correctness. They provide confidence that refactoring did not break behavior. Guardrails come in different forms:
- Unit tests: protect pure logic.
- Integration tests: protect boundaries (DB, HTTP).
- Contract tests: protect compatibility between services/clients.
Legacy code problem: no tests, unclear behavior
When refactoring legacy systems, you often do not know the intended behavior. If you refactor without capturing behavior first, you risk changing semantics unintentionally.
Characterization tests: capture current behavior
A characterization test describes what the system currently does. It does not judge if it is correct. It freezes behavior so you can refactor safely.
Example pattern (conceptual)
// Characterization test: keep current behavior stable
@Test
void pricing_behavior_is_preserved() {
Money price = legacyPricingEngine.calculate(inputScenario());
assertEquals(new Money(12345, "TRY"), price);
}
Once behavior is captured, you can refactor internal structure with confidence.
Golden Master testing (when outputs are complex)
For complex outputs (reports, large JSON), golden master tests snapshot outputs and compare them after changes.
- Run system with fixed inputs.
- Store outputs as 'golden' snapshots.
- After refactor, compare outputs.
Production caveat
Golden master tests can be brittle if outputs include timestamps, random IDs, or ordering. Stabilize outputs by:
- injecting Clock
- fixing random seeds
- sorting collections
- normalizing volatile fields
Safe refactoring workflow (repeatable)
- 1) Add tests around the area you will change.
- 2) Make a small refactor (rename, extract method, introduce parameter).
- 3) Run tests locally and in CI.
- 4) Commit small changes frequently.
- 5) Deploy incrementally (canary).
Make changes reversible
A refactor should be easy to roll back. Avoid mixing refactor with feature changes in the same diff. When you mix them:
- review becomes hard
- root cause becomes ambiguous
- rollback reverts both behavior and structure unexpectedly
Feature flags for refactor safety
Feature flags are not only for product features. They can be used to ship a refactor safely by toggling between old and new implementations.
Strangler pattern (old vs new side-by-side)
public Money price(Order o) {
if (flags.useNewPricing()) {
return newPricing.price(o);
}
return oldPricing.price(o);
}
This enables:
- gradual rollout
- fast rollback
- production comparison (shadow mode) if needed
Shadow mode (advanced but powerful)
You can run both implementations and compare outputs without affecting users:
- return old result
- compute new result in parallel
- log differences with correlation id
Example shadow comparison (conceptual)
Money old = oldPricing.price(o);
Money newer = newPricing.price(o);
if (!old.equals(newer)) {
log.warn("pricing_diff orderId={} old={} new={}", o.id(), old, newer);
}
return old;
Refactor boundaries: seams and adapters
If code is hard to test, create a seam:
- extract interface for external dependency
- wrap static calls behind an adapter
- inject Clock instead of Instant.now()
These seams enable tests and safe refactor.
Common refactoring traps in production
- Large PRs: too hard to review, more likely to ship bugs.
- No tests: refactor becomes rewrite by accident.
- Mixing behavior changes: root cause unclear, rollbacks painful.
- Refactoring hot paths blindly: performance regressions.
- Not measuring: no baseline metrics for success.
Operational safety: measure and observe
Before and after refactor, compare:
- latency
- error rates
- CPU and memory
- database query counts
This ensures refactor did not introduce hidden regressions.
Checklist
- Prefer incremental refactor over big-bang rewrites.
- Use tests as guardrails; add characterization tests for legacy behavior.
- Keep changes small and reversible; avoid mixing refactor with features.
- Use feature flags to roll out refactors safely.
- Consider shadow mode for high-risk refactors.
- Measure production impact (latency, errors, resources).
Final principle
Refactoring is how you keep a system alive. Done safely, it reduces long-term risk. Done recklessly, it creates incidents. Tests are the difference.