Async Timeouts
Timeouts: The Simplest Reliability Control
In production, timeouts are one of the highest-leverage reliability practices. A dependency does not have to be down to hurt you; it can be slow. Without timeouts, slow dependencies cause request pileups, task growth, and cascading failures.
Production mindset:
- Every external call must have a timeout
- Timeouts should be consistent and measurable
- Timeouts must not leak work after the caller gives up
Per-Attempt Timeout with tokio::time::timeout
The most common pattern is wrapping an awaited operation in timeout. If the deadline expires, the inner future is cancelled (dropped).
use tokio::time::{timeout, Duration};
async fn fetch() -> Result<String, String> {
let res = timeout(Duration::from_secs(2), async {
// pretend this is an HTTP call
Ok::
Production note: a timeout error is not the same as a dependency error. It means you exceeded your allowed waiting budget.
Timeouts at the Right Layer
Place timeouts at boundaries:
- HTTP client calls
- Database queries
- Cache calls
- Queue publish/consume operations
Avoid sprinkling timeouts deep inside pure domain logic. Timeouts are operational controls, not business rules.
Overall Deadline vs Per-Operation Timeout
Per-operation timeouts protect individual calls. Many production systems also enforce an overall request deadline so a single request cannot consume resources indefinitely across multiple steps.
Conceptual pattern:
- Request has a total budget (example: 1 second)
- Each dependency call uses a portion of that budget
- If budget is exhausted, fail fast
Even if you start with per-operation timeouts, keep this mental model in mind for later maturity.
Timeouts and Retries: Correct Composition
Retries without timeouts are dangerous. Each attempt must be bounded.
Production-safe composition:
- Timeout per attempt
- Small bounded retry count
- Backoff between attempts
use tokio::time::{timeout, sleep, Duration};
async fn call_once() -> Result<String, String> {
timeout(Duration::from_secs(1), async {
Ok:: Result<String, String> {
let mut attempts = 3;
let mut backoff_ms: u64 = 100;
loop {
match call_once().await {
Ok(v) => return Ok(v),
Err(e) => {
attempts -= 1;
if attempts == 0 {
return Err(e);
}
sleep(Duration::from_millis(backoff_ms)).await;
backoff_ms = (backoff_ms * 2).min(1000);
}
}
}
}
Timeout Values: Practical Defaults
Timeout selection depends on your SLO and dependency behavior. Minimal guidance:
- Prefer short timeouts for upstream calls (hundreds of ms to a few seconds)
- Keep them consistent across the service
- Make them configurable via env for production tuning
Production rule: do not hardcode long timeouts as a way to avoid handling failure. That increases tail latency and reduces capacity.
Observability: Measure Timeout Rates
Timeouts should create visible signals. Track:
- timeout count per dependency
- latency distribution
- retry counts
Minimal structured log example:
tracing::warn!(dep = "payments", "timeout");
Production note: timeouts that spike often indicate upstream degradation. They should trigger investigation before full outage.
Common Production Pitfalls
- No timeouts on external calls (stuck tasks)
- Huge timeouts that hide problems
- Retries without per-attempt timeouts
- Not measuring timeouts (silent degradation)
- Timeouting but still doing work in background after the caller gave up
Production Checklist
- Timeouts on all external awaited operations
- Timeouts defined at boundary layers
- Retries composed with per-attempt timeouts
- Timeout values configurable
- Timeout events observable (logs/metrics)
Async timeouts are a minimal production baseline. They bound latency, protect capacity, and keep your service responsive even when dependencies are slow or failing.