Read-Heavy Optimization

Layering caches, read replicas, and measuring hit rate vs correctness impact.

On this page

Read-Heavy Optimization: Make Reads Cheap Without Breaking Correctness

Read-heavy systems fail when reads are treated as “free.” At scale, reads become the dominant cost driver, and the database becomes the bottleneck through IO, CPU, locks, or connection limits.

Production optimization is about moving read load away from the primary database while keeping correctness guarantees where needed.

Start With Read Path Decomposition

For each hot endpoint, map the read path:

Does it hit cache?
If miss, does it hit replica or primary?
Does it do joins/aggregates?
Does it paginate efficiently?
Is the access pattern skewed (hot keys)?

Read-heavy tuning is ineffective without a read-path diagram.

Cache Hierarchy: L1/L2 Caching

Common production pattern:

L1: in-process cache (very fast, small, per-instance)
L2: Redis/shared cache (fast, shared across fleet)
DB replicas: for cache misses and larger queries

Benefits:

Absorb hot reads at lowest latency (L1)
Share warm cache across instances (L2)
Reduce load on primary (replicas)

Failure mode: cache layers hide DB problems until a cold start causes sudden DB overload.

Read Replicas: Scale Reads, Not Guarantees

Replicas help scale reads but introduce staleness.

Production rules:

Decide which endpoints can tolerate replica lag.
Implement read-your-writes for critical flows (session pinning to primary).
Monitor replica lag and stop routing reads when lag exceeds threshold.

Pagination Discipline: Avoid OFFSET at Scale

OFFSET pagination gets slower with page number because the DB must scan and discard rows.

Production guidance:

Prefer cursor-based pagination (seek method) for large datasets.
Use stable ordering keys (id, created_at + id).
Cache first pages aggressively; deep pages are often rare but expensive.

Read-heavy systems often collapse due to expensive deep pagination queries.

Precompute and Denormalize for Hot Reads

If a read endpoint does heavy joins and aggregates, you can move work to write-time or background processing:

Materialize summaries (counts, top lists)
Denormalize frequently joined attributes
Maintain read models updated by events (CQRS-like patterns)

Tradeoff: write complexity increases. You need backfills and reconciliation.

Request Shaping: Control Fanout and Payload Size

Read-heavy traffic often includes waste:

Over-fetching (large payloads)
High fanout (one request triggers multiple downstream reads)
Unbounded filters (wide scans)

Production controls:

Enforce limits and maximum page sizes
Cache compressed representations if payloads are large
Use “fields” selection or separate endpoints to reduce payload

Hot Key Management

Read-heavy systems often have hot keys:

Homepage feed
Global counters
Top trending list

Mitigations:

Cache with soft TTL + refresh-ahead
Request coalescing (single-flight)
Shard the key (bucketed keys) when safe
Serve slightly stale data intentionally

Indexing Still Matters

Caching reduces load, but misses still occur. Ensure DB queries are index-supported:

Composite indexes aligned with filter + sort
Avoid broad scans for hot endpoints
Keep stats fresh to prevent bad plans

Production rule: cache should complement indexes, not replace them.

Fallback Controls: Protect the Primary

If cache or replicas degrade, traffic falls back to primary. Without controls, this creates cascading failure.

Protections:

Rate limit DB fallback
Circuit breakers per endpoint
Serve degraded responses for non-critical endpoints
Prioritize critical reads over best-effort reads

Failure Modes in Production

Replica lag incident: reads served stale, user workflows break.
Offset pagination meltdown: deep pages trigger huge scans.
Cold cache surge: cache restart floods DB.
Hot key overload: one key saturates Redis or causes single-flight pileups.
Precompute drift: denormalized read model becomes inconsistent.
Fallback cascade: cache miss + retry storms overload primary.

Operational Checklist

Map hot endpoints and their read paths (cache → replica → primary).
Implement cache hierarchy (L1/L2) for ultra-hot reads.
Define replica routing rules and enforce lag thresholds.
Use cursor-based pagination; avoid OFFSET on large datasets.
Precompute/denormalize heavy reads; build backfill and reconciliation.
Control request payload size and fanout; enforce limits.
Handle hot keys with soft TTL, refresh-ahead, and single-flight.
Keep DB indexes aligned with miss queries; monitor plan regressions.
Protect primary with fallback rate limits and circuit breakers.
Load test failure scenarios: cache restart, replica lag, traffic spikes.

Summary

Read-heavy scaling is achieved by shaping the read path: cache hierarchy, replicas with explicit staleness policy, pagination discipline, hot key management, and precomputed read models for expensive joins/aggregates. The primary goal is not only speed but preventing cascading failures when caches or replicas degrade.

← TTL Design Strategies

Cache Consistency Models →