HttpClientFactory: Pooling & DNS Reality

HttpClientFactory fixes the classic socket exhaustion trap, but it does not magically make outbound HTTP safe. This covers handler pooling, DNS refresh, lifetime tuning, and how to stop stale IPs and connection storms from taking your service down.

On this page

Production incident

You deploy a service that calls an upstream behind a load balancer. Everything looks fine for hours. Then a routine upstream rotation happens and the load balancer changes IPs. Your service keeps calling dead IPs, requests hang, thread pool queues up, and latency explodes. At the same time, you see a spike in ephemeral ports and TIME_WAIT because another team member created new HttpClient instances per request to "avoid stale DNS". You now have two outages: stale DNS and socket exhaustion.

What actually happened

Creating HttpClient per request creates new sockets aggressively. Under load, you run out of ephemeral ports and you drown in TIME_WAIT.
Keeping a single static HttpClient forever can keep connections pinned to old endpoints if DNS changes and connections are reused indefinitely.
IHttpClientFactory solves both by pooling handlers and rotating them on a schedule, but you must configure lifetimes and per-client policies intentionally.

Symptoms

Outbound calls start timing out in bursts after upstream IP rotation or failover.
Connection-level errors: connection refused, no route to host, name resolution errors, TLS handshake failures under load.
App CPU might be moderate, but thread pool queue length grows because requests are waiting on I/O.
OS networking shows high socket counts, TIME_WAIT spikes, or SNAT exhaustion in cloud environments.

Root causes

Per-request HttpClient: creates too many sockets and defeats pooling.
Infinite handler lifetime: DNS changes do not take effect for existing connections; you keep calling dead IPs.
Wrong client usage: using the wrong named client, missing base address, missing default headers, or bypassing the factory.

Diagnosis

# App-side: look for incorrect construction
grep -R "new HttpClient" -n .
grep -R "IHttpClientFactory" -n .

# Linux quick checks (if you have node access)
ss -s
ss -tan state time-wait | wc -l
ss -tan state established | wc -l

# Container level (if available)
cat /proc/net/sockstat

In logs and tracing, confirm whether failures correlate with upstream changes, deployment windows, or traffic spikes. If you have distributed tracing, check outbound span counts and duration distributions.

Anti-pattern

// This will blow up in prod under load (socket exhaustion)
public async Task<string> CallUpstream()
{
    using var client = new HttpClient();
    return await client.GetStringAsync("https://upstream/api");
}

// This can blow up later (stale connections forever)
public static readonly HttpClient Client = new HttpClient();
// No handler rotation, no per-service policy boundary

Correct pattern

Use IHttpClientFactory and configure handler lifetimes and connection behavior. Separate clients by upstream and by risk profile.

// Program.cs / Startup.cs
services.AddHttpClient("UpstreamA", client =>
{
    client.BaseAddress = new Uri("https://upstream-a/");
    client.DefaultRequestHeaders.UserAgent.ParseAdd("myservice/1.0");
})
.ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
{
    // Connection pooling defaults are usually fine, but tune for your environment:
    PooledConnectionIdleTimeout = TimeSpan.FromMinutes(2),
    PooledConnectionLifetime = TimeSpan.FromMinutes(10), // forces rotation
    MaxConnectionsPerServer = 128,
    AutomaticDecompression = System.Net.DecompressionMethods.GZip | System.Net.DecompressionMethods.Deflate
})
.SetHandlerLifetime(TimeSpan.FromMinutes(10)); // handler rotation for DNS refresh

// Usage
public class UpstreamClient
{
    private readonly HttpClient _http;

    public UpstreamClient(IHttpClientFactory factory)
        => _http = factory.CreateClient("UpstreamA");

    public Task<HttpResponseMessage> GetHealthAsync(CancellationToken ct)
        => _http.GetAsync("health", ct);
}

DNS and lifetime guidance

Handler lifetime: if you never rotate handlers, you can keep stale connections. If you rotate too aggressively, you lose reuse and increase handshake cost.
Start points: 5 to 10 minutes lifetime is a common baseline for services behind load balancers. Adjust based on upstream rotation frequency and connection cost.
PooledConnectionLifetime: forces connection recycling even if active, helping spread load and refresh endpoints.

Security and performance impact

Performance: correct pooling reduces latency and CPU by reusing connections and TLS sessions.
Security: stable client configuration prevents accidental header leakage and supports consistent TLS policy (protocols, cert validation, proxy rules).

Operational notes

Monitoring: outbound latency, timeout rate, DNS failures, connection errors, active sockets, and retry volume.
Rollout: change lifetimes gradually. A too-short lifetime can spike TLS handshakes and hurt upstream.
Rollback: keep configuration behind a feature flag (handler lifetime, max connections). Be able to revert without redeploying code.

Checklist

No per-request HttpClient creation on hot paths.
Each upstream has a named or typed client with explicit configuration.
Handler lifetime and pooled connection lifetime are set intentionally.
Connection limits are set to prevent overload (MaxConnectionsPerServer).
Dashboards show outbound failures, latency, and socket pressure signals.

Timeouts & Deadlines (Stop Hanging Threads) →