DISTRIBUTED-SYSTEMS-ENGINEERING Contents

Load Shedding (Fail Small, Not Catastrophically)

Load shedding protects system stability by deliberately rejecting excess traffic under high load. This lesson explains admission control, priority-based rejection, overload signals, and production-safe degradation strategies.

On this page

Load Shedding: Failing Fast to Preserve System Stability

Load shedding is the deliberate rejection of excess work when a system approaches or exceeds its capacity limits. Instead of allowing overload to propagate and collapse the entire system, load shedding sacrifices some requests to preserve overall availability.

In distributed systems, overload is inevitable. The question is not whether overload happens, but how the system responds when it does.

The Core Problem: Overload Amplification

When traffic exceeds capacity:

  • Queues grow unbounded.
  • Latency increases.
  • Timeouts trigger retries.
  • Retries amplify load further.
  • Eventually, the system crashes.

Without load shedding, overload spreads across services and creates cascading failure.

Fail Fast vs Fail Slow

  • Fail slow: accept all requests, increase latency, exhaust resources.
  • Fail fast: reject early, preserve core capacity.

Fail-fast systems degrade gracefully. Fail-slow systems collapse.

Production Scenario: Flash Traffic Event

Symptom

Traffic spike causes API latency to increase from 50ms to several seconds. Shortly after, service becomes unresponsive.

Root Cause

No admission control or request rejection mechanism. Thread pools and connection pools were saturated, causing global slowdown.

Diagnosis

  • Queue depth continuously increasing.
  • CPU near 100 percent.
  • Retry amplification from upstream services.

Resolution

  • Introduce request rate limiting.
  • Implement priority-based admission control.
  • Reject low-priority requests under high load.

Load Shedding Strategies

1) Admission Control

Reject requests when queue length or concurrency exceeds threshold.

if active_requests > threshold:
    reject_request()

This prevents queue explosion.

2) Priority-Based Shedding

Differentiate critical vs non-critical traffic.

  • Core transactions remain allowed.
  • Analytics, reporting, or background jobs are rejected first.

Preserving core business flows is the primary objective.

3) Adaptive Load Shedding

Adjust rejection thresholds dynamically based on:

  • CPU usage
  • Memory pressure
  • Latency percentiles
  • Error rates

Static thresholds may not reflect real-time pressure.

4) Per-Client or Per-Tenant Limits

Prevent one tenant or client from monopolizing capacity.

This isolates heavy users from impacting others.

Interaction with Other Patterns

  • Retries: must respect rejection signals and avoid immediate retry storms.
  • Circuit breakers: prevent downstream overload.
  • Bulkheads: isolate workloads before shedding occurs.
  • Timeouts: ensure rejected work releases resources quickly.

Load shedding is part of a broader resilience strategy.

Choosing Shedding Signals

Common overload indicators:

  • Queue depth exceeding safe limit
  • Thread pool saturation
  • CPU above threshold
  • P99 latency above SLO target
  • Connection pool exhaustion

Shedding should trigger before complete resource exhaustion.

Graceful Error Responses

Rejected requests should return clear signals:

  • HTTP 429 (Too Many Requests)
  • HTTP 503 (Service Unavailable)
  • Explicit retry-after headers

Clients must be designed to respect these signals.

Observability Requirements

  • Rejection rate per endpoint
  • Active request count
  • Queue depth metrics
  • Latency percentiles
  • Retry amplification ratio

Load shedding must be visible and measurable.

Failure Injection Test

# Load shedding validation
1) Gradually increase request rate beyond capacity
2) Observe queue depth and latency growth
3) Confirm load shedding triggers at threshold
4) Verify critical endpoints remain stable
5) Measure rejection percentage under overload

Common Anti-Patterns

  • Accepting all traffic without limits
  • Rejecting randomly without prioritization
  • No backpressure signal to clients
  • Retrying immediately after rejection
  • No monitoring of rejection metrics

Operational Checklist

  • Are overload thresholds clearly defined?
  • Are critical paths protected from shedding?
  • Do clients respect rejection responses?
  • Is load shedding behavior tested under stress?
  • Are rejection metrics part of SLO monitoring?

Key Takeaways

  • Load shedding prevents cascading overload.
  • Fail-fast behavior is safer than fail-slow behavior.
  • Admission control protects system capacity.
  • Priority-based shedding preserves core functionality.
  • Shedding must integrate with retries, timeouts, and circuit breakers.

Load shedding is a deliberate sacrifice of some requests to save the whole system. In production-grade distributed systems, controlled rejection is a feature — not a failure.