Load Shedding (Fail Small, Not Catastrophically)

Load shedding protects system stability by deliberately rejecting excess traffic under high load. This lesson explains admission control, priority-based rejection, overload signals, and production-safe degradation strategies.

On this page

Load Shedding: Failing Fast to Preserve System Stability

Load shedding is the deliberate rejection of excess work when a system approaches or exceeds its capacity limits. Instead of allowing overload to propagate and collapse the entire system, load shedding sacrifices some requests to preserve overall availability.

In distributed systems, overload is inevitable. The question is not whether overload happens, but how the system responds when it does.

The Core Problem: Overload Amplification

When traffic exceeds capacity:

Queues grow unbounded.
Latency increases.
Timeouts trigger retries.
Retries amplify load further.
Eventually, the system crashes.

Without load shedding, overload spreads across services and creates cascading failure.

Fail Fast vs Fail Slow

Fail slow: accept all requests, increase latency, exhaust resources.
Fail fast: reject early, preserve core capacity.

Fail-fast systems degrade gracefully. Fail-slow systems collapse.

Production Scenario: Flash Traffic Event

Symptom

Traffic spike causes API latency to increase from 50ms to several seconds. Shortly after, service becomes unresponsive.

Root Cause

No admission control or request rejection mechanism. Thread pools and connection pools were saturated, causing global slowdown.

Diagnosis

Queue depth continuously increasing.
CPU near 100 percent.
Retry amplification from upstream services.

Resolution

Introduce request rate limiting.
Implement priority-based admission control.
Reject low-priority requests under high load.

Load Shedding Strategies

1) Admission Control

Reject requests when queue length or concurrency exceeds threshold.

if active_requests > threshold:
    reject_request()

This prevents queue explosion.

2) Priority-Based Shedding

Differentiate critical vs non-critical traffic.

Core transactions remain allowed.
Analytics, reporting, or background jobs are rejected first.

Preserving core business flows is the primary objective.

3) Adaptive Load Shedding

Adjust rejection thresholds dynamically based on:

CPU usage
Memory pressure
Latency percentiles
Error rates

Static thresholds may not reflect real-time pressure.

4) Per-Client or Per-Tenant Limits

Prevent one tenant or client from monopolizing capacity.

This isolates heavy users from impacting others.

Interaction with Other Patterns

Retries: must respect rejection signals and avoid immediate retry storms.
Circuit breakers: prevent downstream overload.
Bulkheads: isolate workloads before shedding occurs.
Timeouts: ensure rejected work releases resources quickly.

Load shedding is part of a broader resilience strategy.

Choosing Shedding Signals

Common overload indicators:

Queue depth exceeding safe limit
Thread pool saturation
CPU above threshold
P99 latency above SLO target
Connection pool exhaustion

Shedding should trigger before complete resource exhaustion.

Graceful Error Responses

Rejected requests should return clear signals:

HTTP 429 (Too Many Requests)
HTTP 503 (Service Unavailable)
Explicit retry-after headers

Clients must be designed to respect these signals.

Observability Requirements

Rejection rate per endpoint
Active request count
Queue depth metrics
Latency percentiles
Retry amplification ratio

Load shedding must be visible and measurable.

Failure Injection Test

# Load shedding validation
1) Gradually increase request rate beyond capacity
2) Observe queue depth and latency growth
3) Confirm load shedding triggers at threshold
4) Verify critical endpoints remain stable
5) Measure rejection percentage under overload

Common Anti-Patterns

Accepting all traffic without limits
Rejecting randomly without prioritization
No backpressure signal to clients
Retrying immediately after rejection
No monitoring of rejection metrics

Operational Checklist

Are overload thresholds clearly defined?
Are critical paths protected from shedding?
Do clients respect rejection responses?
Is load shedding behavior tested under stress?
Are rejection metrics part of SLO monitoring?

Key Takeaways

Load shedding prevents cascading overload.
Fail-fast behavior is safer than fail-slow behavior.
Admission control protects system capacity.
Priority-based shedding preserves core functionality.
Shedding must integrate with retries, timeouts, and circuit breakers.

Load shedding is a deliberate sacrifice of some requests to save the whole system. In production-grade distributed systems, controlled rejection is a feature — not a failure.

← Bulkheads (Stop One Fire From Burning the Ship)

Backpressure in Practice (Push vs Pull, Bounded Queues) →