SYSTEM-DESIGN Contents

Cache Invalidation and Consistency

Invalidation is the hardest part: choose TTLs, versioning, or event-driven invalidation based on correctness requirements and failure modes.

On this page

Why Invalidation Is Hard

Caching breaks the default guarantee that reads reflect the latest write. Invalidation is hard because updates can happen from multiple paths, failures can interrupt invalidation, and distributed systems do not give you perfect ordering for free.

Three Common Approaches

  • TTL-based: expire after a duration; simplest, but serves stale data until expiry.
  • Write-through / write-around: update cache on writes or bypass cache on writes; reduces staleness but adds complexity.
  • Event-driven invalidation: publish events to invalidate affected keys; fast correctness, but requires reliable delivery and idempotency.

Choose Based on Correctness Needs

Not all data requires strong freshness. Define freshness requirements per endpoint. For user-visible reads, seconds of staleness may be acceptable. For billing or permissions, it may not be.

Versioned Keys

Versioning is a practical invalidation technique: instead of deleting keys, you change the key version when the underlying data changes. This avoids races where old invalidations arrive late.

Example:
user:123:v1
When user profile changes -> bump to v2
user:123:v2

Multi-Key and Fan-Out Problems

Some writes affect many cached reads: updating a post can affect feeds, search indexes, and aggregates. Production-first systems avoid invalidating thousands of keys synchronously in the write path. They prefer background workers, event-driven refresh, or bounded invalidation strategies.

Failure Modes You Must Plan For

  • Cache is down: does your app fail open (go to DB) or fail closed?
  • Invalidation fails: how long can stale data persist?
  • Replication lag: read replicas may be behind writes, compounding staleness.

Production-First Takeaway

Pick the simplest invalidation strategy that meets correctness requirements. Use TTLs for low-risk data, versioned keys to avoid races, and event-driven invalidation when freshness is critical and you can operate the pipeline reliably.