REACT Contents

Saga Pattern

Sagas manage distributed workflows without global transactions. Learn choreography vs orchestration, compensations, idempotent steps, and failure handling to avoid stuck orders and inconsistent side effects.

On this page

Why Sagas Exist

  • Distributed systems rarely support safe global ACID transactions across services.
  • Workflows span services: order, payment, inventory, shipping.
  • Sagas provide eventual consistency with explicit steps and compensations.

Saga Types

  • Choreography: services react to events and trigger next steps.
  • Orchestration: a coordinator issues commands and tracks progress.
  • Production rule: pick based on observability and complexity, not ideology.

Core Design Requirements

  • Each step must be idempotent.
  • Each step must have a compensation or a defined terminal failure policy.
  • State must be persisted for recovery after crashes.
  • Timeouts are part of correctness and must be explicit.

Compensation vs Rollback

  • Compensation is a new business action that semantically undoes a previous step.
  • Compensation may be imperfect: refunds instead of reversing authorization.
  • Production rule: define acceptable outcomes for partial completion.

Choreography Failure Risks

  • Hidden coupling through events and implicit ordering assumptions.
  • Harder end to end visibility without correlation and tracing.
  • Poison messages can stall a step and block progress.

Orchestration Failure Risks

  • Coordinator becomes a bottleneck or single point of failure if not replicated.
  • State machine bugs can cause stuck workflows.
  • Complexity moves into the orchestrator codebase.

Operational Patterns

  • Use correlation ids across all commands and events.
  • Persist saga state as a state machine with explicit transitions.
  • Use outbox pattern for publish reliability.
  • Use dead letter queues and replay tooling.

Failure Modes

  • Stuck saga: step never completes due to missing event or consumer lag.
  • Duplicate step: message redelivery triggers a side effect twice.
  • Compensation failure: refund fails and workflow remains inconsistent.
  • Out of order events: state machine applies transitions incorrectly.

Production Checklist

  • All steps are idempotent and have retry caps.
  • Compensation is defined and tested for each reversible step.
  • Saga state is persisted and recoverable after restart.
  • Correlation ids and tracing provide end to end visibility.
  • Runbooks exist for replay, manual intervention, and compensation.