Load Shedding (Fail Small, Not Catastrophically)
Load Shedding: Failing Fast to Preserve System Stability
Load shedding is the deliberate rejection of excess work when a system approaches or exceeds its capacity limits. Instead of allowing overload to propagate and collapse the entire system, load shedding sacrifices some requests to preserve overall availability.
In distributed systems, overload is inevitable. The question is not whether overload happens, but how the system responds when it does.
The Core Problem: Overload Amplification
When traffic exceeds capacity:
- Queues grow unbounded.
- Latency increases.
- Timeouts trigger retries.
- Retries amplify load further.
- Eventually, the system crashes.
Without load shedding, overload spreads across services and creates cascading failure.
Fail Fast vs Fail Slow
- Fail slow: accept all requests, increase latency, exhaust resources.
- Fail fast: reject early, preserve core capacity.
Fail-fast systems degrade gracefully. Fail-slow systems collapse.
Production Scenario: Flash Traffic Event
Symptom
Traffic spike causes API latency to increase from 50ms to several seconds. Shortly after, service becomes unresponsive.
Root Cause
No admission control or request rejection mechanism. Thread pools and connection pools were saturated, causing global slowdown.
Diagnosis
- Queue depth continuously increasing.
- CPU near 100 percent.
- Retry amplification from upstream services.
Resolution
- Introduce request rate limiting.
- Implement priority-based admission control.
- Reject low-priority requests under high load.
Load Shedding Strategies
1) Admission Control
Reject requests when queue length or concurrency exceeds threshold.
if active_requests > threshold:
reject_request()
This prevents queue explosion.
2) Priority-Based Shedding
Differentiate critical vs non-critical traffic.
- Core transactions remain allowed.
- Analytics, reporting, or background jobs are rejected first.
Preserving core business flows is the primary objective.
3) Adaptive Load Shedding
Adjust rejection thresholds dynamically based on:
- CPU usage
- Memory pressure
- Latency percentiles
- Error rates
Static thresholds may not reflect real-time pressure.
4) Per-Client or Per-Tenant Limits
Prevent one tenant or client from monopolizing capacity.
This isolates heavy users from impacting others.
Interaction with Other Patterns
- Retries: must respect rejection signals and avoid immediate retry storms.
- Circuit breakers: prevent downstream overload.
- Bulkheads: isolate workloads before shedding occurs.
- Timeouts: ensure rejected work releases resources quickly.
Load shedding is part of a broader resilience strategy.
Choosing Shedding Signals
Common overload indicators:
- Queue depth exceeding safe limit
- Thread pool saturation
- CPU above threshold
- P99 latency above SLO target
- Connection pool exhaustion
Shedding should trigger before complete resource exhaustion.
Graceful Error Responses
Rejected requests should return clear signals:
- HTTP 429 (Too Many Requests)
- HTTP 503 (Service Unavailable)
- Explicit retry-after headers
Clients must be designed to respect these signals.
Observability Requirements
- Rejection rate per endpoint
- Active request count
- Queue depth metrics
- Latency percentiles
- Retry amplification ratio
Load shedding must be visible and measurable.
Failure Injection Test
# Load shedding validation 1) Gradually increase request rate beyond capacity 2) Observe queue depth and latency growth 3) Confirm load shedding triggers at threshold 4) Verify critical endpoints remain stable 5) Measure rejection percentage under overload
Common Anti-Patterns
- Accepting all traffic without limits
- Rejecting randomly without prioritization
- No backpressure signal to clients
- Retrying immediately after rejection
- No monitoring of rejection metrics
Operational Checklist
- Are overload thresholds clearly defined?
- Are critical paths protected from shedding?
- Do clients respect rejection responses?
- Is load shedding behavior tested under stress?
- Are rejection metrics part of SLO monitoring?
Key Takeaways
- Load shedding prevents cascading overload.
- Fail-fast behavior is safer than fail-slow behavior.
- Admission control protects system capacity.
- Priority-based shedding preserves core functionality.
- Shedding must integrate with retries, timeouts, and circuit breakers.
Load shedding is a deliberate sacrifice of some requests to save the whole system. In production-grade distributed systems, controlled rejection is a feature — not a failure.