Retries and Timeouts Basics

Add basic retries and timeouts to protect your Rust service from slow or flaky dependencies, using bounded attempts, backoff, and clear failure behavior at production boundaries.

On this page

Retries and Timeouts Are Dependency Management

In production, most failures are not "your code crashed." They are timeouts, slowdowns, and intermittent errors in dependencies: databases, upstream HTTP services, caches, and queues. A reliable service must bound how long it waits and how often it retries.

Production mindset:

Timeouts prevent request threads/tasks from getting stuck
Retries recover from transient failures
Limits prevent retry storms that amplify outages

Start With Timeouts: Always Bound Waiting

Before you add retries, ensure every external call has a timeout. Without timeouts, retries can make outages worse by stacking stuck tasks.

Sync example with std::time (conceptual):

use std::time::{Duration, Instant};

fn do_work_with_deadline(deadline: Instant) -> Result<(), String> {
    if Instant::now() > deadline {
        return Err("deadline exceeded".to_string());
    }
    Ok(())
}

In async Rust (Tokio), use timeouts around awaited operations:

use tokio::time::{timeout, Duration};

async fn call_upstream() -> Result<String, String> {
    let result = timeout(Duration::from_secs(2), async {
        // pretend this is an HTTP call
        Ok::

Production rule: every network and database operation should be bounded by a timeout or deadline.

Retries: Only for Transient Failures
Retries are appropriate when failure is likely transient:

Network hiccup
Temporary upstream overload (5xx)
Connection reset
Rate-limited responses (with respect for Retry-After)


Retries are not appropriate for:

Validation errors (4xx)
Authentication failures
Schema errors
Deterministic business rule failures


A Minimal Retry Loop with Backoff
Keep retries bounded and add a small backoff. Even a simple exponential backoff reduces thundering herds.

use tokio::time::{sleep, Duration};

async fn retry_simple<F, Fut, T>(
    mut attempts: u32,
    mut f: F,
) -> Result<T, String>
where
    F: FnMut() -> Fut,
    Fut: std::future::Future<Output = Result<T, String>>,
{
    let mut backoff_ms: u64 = 100;

    loop {
        match f().await {
            Ok(v) => return Ok(v),
            Err(e) => {
                attempts -= 1;
                if attempts == 0 {
                    return Err(e);
                }

                sleep(Duration::from_millis(backoff_ms)).await;
                backoff_ms = (backoff_ms * 2).min(1000);
            }
        }
    }
}

Production note: keep maximum backoff bounded. Infinite backoff or unbounded retries can hide incidents and create long tail latencies.

Combine Timeout + Retry Correctly
Each attempt should be bounded by its own timeout, and the whole operation should also have an overall deadline when possible.

use tokio::time::{timeout, Duration};

async fn call_with_timeout() -> Result<String, String> {
    timeout(Duration::from_secs(2), async {
        Ok::

Then retry that bounded attempt:

async fn call_with_retry() -> Result<String, String> {
    retry_simple(3, || async {
        call_with_timeout().await
    }).await
}

Production rule: never retry an unbounded operation.

Idempotency: Retries Must Be Safe
Retries can cause duplicate effects if the operation is not idempotent. Reads are usually safe. Writes must be designed carefully.

Examples of safe retry:

GET /resource (read)
PUT with the same payload (idempotent update)
POST with an idempotency key


Production rule: only retry writes if you are confident they are idempotent or protected by idempotency keys.

Prevent Retry Storms
When a dependency is down, retries can multiply load and make recovery harder. Basic protections:

Small max retry count (2-3 attempts)
Backoff with jitter (optional at this stage)
Timeouts on every attempt
Fail fast when the dependency is clearly unhealthy


As you mature, you add circuit breakers and bulkheads, but the minimal baseline is bounded retries + timeouts.

Observability Signals to Add
Even in a minimal setup, emit signals that help you detect dependency issues:

Count retries
Count timeouts
Measure call latency


At least log with stable fields:

tracing::warn!(attempt = 2, "upstream call failed, retrying");

Production rule: do not log full error bodies from external services if they may contain sensitive data.

Production Checklist

Timeouts on all external calls
Retries only for transient failures
Small bounded retry count (2-3)
Backoff between attempts
Retries safe for idempotent operations
Retry and timeout signals observable (logs/metrics)


Retries and timeouts are the first line of defense against real-world flakiness. They do not guarantee reliability, but without them, production incidents become inevitable and harder to recover from.

← Error Context and Chaining

Idempotency Basics →