Timeouts & Deadlines (Stop Hanging Threads)

Timeouts are not optional. Without explicit time budgets, hung upstream calls will pile up, exhaust thread pool, and cascade into a full outage. This covers per-request deadlines, connect timeouts, and cancellation wiring end-to-end.

On this page

Production incident

An upstream endpoint becomes slow but does not fail. Requests hang for minutes. Your service keeps accepting traffic and waiting. Thread pool queue grows, memory climbs due to buffered responses, and the whole API becomes unresponsive. The root cause is simple: no explicit timeouts, no cancellation propagation, and a default HttpClient timeout that is either infinite or completely misaligned with your SLO.

Symptoms

Latency climbs gradually, then falls off a cliff under concurrency.
Many requests stuck in "in flight" state with no response.
Thread pool starvation signals, increased queue length, and rising memory due to pending tasks and buffers.
Downstream looks "alive" but slow, causing a slow-loris style failure at the application level.

Causes

No deadline: outbound calls wait indefinitely.
Wrong timeout layer: setting only HttpClient.Timeout or only server request timeout, but not aligning budgets across the stack.
Cancellation not wired: CancellationToken is ignored, so aborting the request does not abort the outbound call.
Connect vs request: you need connect timeouts and total request deadlines; they are not the same.

Diagnosis

# Find missing cancellation propagation
grep -R "CancellationToken" -n . | head
grep -R "GetAsync(" -n .
grep -R "PostAsync(" -n .

# Look for long running outbound spans in tracing
# If you do not have traces, log outbound duration and status codes

Confirm whether aborted inbound requests still continue to call upstream. This is a classic leak of work and money in production.

Anti-pattern

// No cancellation, no deadline. This will accumulate hanging calls.
public Task<HttpResponseMessage> CallUpstream()
{
    return _http.GetAsync("api/slow");
}

// Misleading: one global timeout that does not match per-endpoint budgets
_http.Timeout = TimeSpan.FromMinutes(5);

Correct pattern

Set budgets per call, propagate CancellationToken, and use a linked token source to enforce a hard deadline.

public async Task<HttpResponseMessage> CallUpstreamAsync(CancellationToken requestAborted)
{
    // Example budget: total 2 seconds for this downstream call
    using var cts = CancellationTokenSource.CreateLinkedTokenSource(requestAborted);
    cts.CancelAfter(TimeSpan.FromSeconds(2));

    using var req = new HttpRequestMessage(HttpMethod.Get, "api/slow");

    // ResponseHeadersRead prevents buffering the entire response before returning
    var resp = await _http.SendAsync(
        req,
        HttpCompletionOption.ResponseHeadersRead,
        cts.Token
    );

    return resp;
}

Connect timeout and handshake considerations

Some hangs are connect or TLS handshake stalls. Prefer configuring handler-level timeouts where available and keep them tighter than your total budget.

services.AddHttpClient("UpstreamB")
.ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
{
    ConnectTimeout = TimeSpan.FromSeconds(1),
    PooledConnectionLifetime = TimeSpan.FromMinutes(5)
});

Budget alignment

Inbound server budget: if your endpoint SLO is 1 second, your downstream calls cannot each take 1 second. You need a budget split.
Per-hop budgets: allocate time for retries, queueing, and serialization.
Hard stop: always have a total deadline, even if you allow some retries inside.

Security and performance impact

Performance: hung calls are a resource leak. Timeouts protect capacity and reduce tail latency.
Security: slow upstream dependencies can be used to degrade availability. Proper deadlines reduce application-level DOS impact.

Operational notes

Monitoring: timeout rate per upstream and per endpoint, duration histograms, cancelled request counts, and concurrent outbound calls.
Rollout: introduce timeouts gradually. Too aggressive can increase error rate. The goal is controlled failure, not random failure.
Rollback: keep budgets configurable so you can relax them temporarily during upstream incidents, without redeploy.

Checklist

Every outbound call has a total deadline.
CancellationToken is propagated from inbound request to outbound call.
Connect timeout is configured where appropriate.
Response buffering is controlled (ResponseHeadersRead for large payloads).
Dashboards show timeout rate and long-tail durations.

← HttpClientFactory: Pooling & DNS Reality

Retries & Backoff Policies (When It's Okay) →