timeouts, resilience patterns
On this page
Why Missing Timeouts Cause Cascading Failures
In distributed systems, slow is worse than down. If a downstream dependency slows down and you have no timeouts: - Threads block indefinitely - Connection pools exhaust - Request queues grow - CPU rises due to context switching - Kubernetes restarts pods - Traffic shifts to fewer healthy instances - Entire cluster destabilizes This is how cascading failure begins.Incident Scenario: One Slow Dependency Took Down Everything
A payment provider degraded from 150ms to 25 seconds. Your service had no HTTP read timeout. Each request waited. Thread pool of 200 threads filled. New requests queued. Liveness probe failed. Pods restarted. Load concentrated on remaining instances. Full outage in 90 seconds. Root cause: No timeout. No circuit breaker. No containment.Anti-Pattern: Infinite Wait + Blind Retry
Common production mistakes: - No connect timeout - No read timeout - Retry on every exception - Unlimited retries - Retry without backoff Retries increase load on a failing dependency. That is not resilience. That is amplification.Explicit Timeouts Everywhere
Every I/O boundary must define timeouts: - HTTP client connect timeout - HTTP client read timeout - Database query timeout - Messaging poll timeout - Overall request deadline Example RestTemplate configuration:
import org.springframework.boot.web.client.RestTemplateBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;
import java.time.Duration;
@Configuration
public class HttpClientConfig {
@Bean
public RestTemplate restTemplate(RestTemplateBuilder builder) {
return builder
.setConnectTimeout(Duration.ofSeconds(2))
.setReadTimeout(Duration.ofSeconds(3))
.build();
}
}
Never rely on defaults.
Some drivers default to infinite timeout.
Circuit Breaker: Fail Fast Instead of Hanging
Circuit breaker behavior: - Tracks failure rate - Opens when threshold exceeded - Immediately rejects new calls - Periodically tests recovery This prevents thread exhaustion. Example with Resilience4j:
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import org.springframework.stereotype.Service;
@Service
public class PricingClient {
@CircuitBreaker(name = "pricing", fallbackMethod = "fallback")
public Price getPrice(String sku) {
// call downstream
return new Price();
}
public Price fallback(String sku, Throwable t) {
return Price.unavailable(sku);
}
}
Important:
Fallback must not hide critical failures silently.
Sometimes failing fast is safer than returning degraded data.