DOTNET Contents

Rollbacks vs Forward-Fix (When Each Wins)

Rollbacks are not a strategy unless they are fast, safe, and rehearsed. Many incidents cannot be rolled back because migrations are destructive or data is already written in the new format. Learn when to roll back and when to forward-fix.

On this page

Production incident

A new release introduces a bug. The team tries to roll back, but the migration dropped a column. Old version crashes. Now rollback is impossible. You must forward-fix under pressure. Another incident: rollback would have been safe, but it took 30 minutes because image tags were not immutable and the pipeline was slow. A rollback that takes an hour is not rollback, it is a second incident.

Symptoms

  • Rollback fails due to schema incompatibility.
  • Rollback is slow and manual; people hesitate and waste time.
  • Forward-fix is rushed and risky because rollback was not viable.

Root causes

  • Destructive migrations in the same release.
  • No expand/contract discipline.
  • No immutable artifact versioning.
  • No rehearsed rollback procedure.

Decision rule: rollback vs forward-fix

  • Rollback when: bug is clearly introduced by the release, data compatibility is safe, and rollback is fast (< minutes).
  • Forward-fix when: schema/data has changed incompatibly, or the bug requires a quick patch without reverting state.
  • Mitigate first: reduce blast radius (feature flags off, rate limits, disable endpoint) while you decide.

Anti-pattern

  • “We’ll just roll back” without verifying schema compatibility.
  • Rolling back repeatedly, flapping versions, causing cache storms and unstable behavior.
  • Forward-fixing without canary, causing a second regression.

Correct pattern

  • Make rollback possible: immutable versions, blue/green traffic switch, non-destructive migrations.
  • Have feature flags to disable risky paths without deploy.
  • Rehearse rollback quarterly; treat it like a fire drill.

Operational notes

  • Monitoring: rollback success rate, time-to-rollback, and incident recurrence after rollback.
  • Rollout: canary releases reduce the need for rollbacks.
  • Postmortem: every rollback failure is a design failure. Fix the process and the deployment model.

Checklist

  • Rollback is a traffic switch, not a rebuild.
  • Artifacts are immutable and versioned.
  • Migrations are compatible with rollback (expand/contract).
  • Feature flags exist for high-risk behaviors.
  • Rollback is rehearsed and documented.