REACT Contents

CAP Theorem in Practice

CAP is a tradeoff under partition: choose between consistency and availability per operation. Learn how real systems mix modes, define SLAs, and avoid cache leaks and split brain during incidents.

On this page

What CAP Actually Says

  • When a network partition happens, you must trade between consistency and availability.
  • Partition tolerance is not optional in distributed systems.
  • CAP is about behavior under partition, not normal operation.

Operational Interpretation

  • Consistency: reads reflect the latest acknowledged write or a valid ordering.
  • Availability: every request receives a response without waiting for unreachable nodes.
  • Under partition: you either reject some operations or serve potentially stale or divergent data.

Per Operation Choices

  • Read path can be more available than write path.
  • Some operations can tolerate staleness, others cannot.
  • Strong consistency is often required for money movement and permissions.

Common Patterns

  • CP systems: reject writes during partition to preserve consistency.
  • AP systems: accept writes and reconcile later, using conflict resolution.
  • Hybrid: use CP for critical metadata, AP for high volume events.

Quorums as a Dial

  • Quorum reads and writes can increase consistency but reduce availability during failures.
  • Lower quorum increases availability but can serve stale reads.
  • Production rule: define quorum policy per dataset and operation.

Failure Modes

  • Split brain due to weak leader election or stale leases.
  • Cache shows wrong authorization decisions when stale data is used.
  • Read your writes breaks when reads route to replicas with lag.
  • Reconciliation produces conflicts without a defined policy.

Incident Playbook

  • Detect partition: rising timeouts, asymmetric reachability, replica lag spike.
  • Freeze critical writes if needed to prevent divergence.
  • Reduce load with backoff, circuit breakers, and load shedding.
  • Confirm leadership and membership view before resuming full traffic.

Production Checklist

  • Define which operations require strong consistency.
  • Document behavior under partition: reject, degrade, or reconcile.
  • Test partition scenarios and validate recovery steps.
  • Monitor leader health, quorum success rate, and replica lag.