PYTHON Contents

Retries and Backoff (When Retries Are Harmful)

Retry only transient failures with bounded attempts, exponential backoff, and jitter to avoid retry storms and outage amplification.

On this page

Retry Policy

  • Retry transient failures (timeouts, 5xx) only.
  • Use bounded attempts and exponential backoff.
  • Add jitter to avoid synchronized retries.

Simple Backoff with Jitter

import random
import time

def retry(fn, *, attempts: int, base_backoff: float):
    last_exc = None
    for i in range(attempts):
        try:
            return fn()
        except Exception as e:
            last_exc = e
            sleep = base_backoff * (2 ** i)
            sleep = sleep * (0.5 + random.random())  # jitter
            time.sleep(sleep)
    raise last_exc

Operational Checklist

  • Retries must have timeouts; otherwise retries just extend hangs.
  • Use a retry budget per request/service to limit amplification.
  • Do not retry non-idempotent operations without safeguards.

Failure Modes

  • Retry storm: clients pile on a failing dependency.
  • Hidden latency: retries make p99 worse even if success rate improves.