Graceful Shutdown Basics

Graceful shutdown is how you avoid dropping in-flight requests and corrupting state during deploys, autoscaling, and node failures. Wire cancellation, stop accepting traffic, and drain correctly.

On this page

Graceful Shutdown Is a Production Contract

In production, your process will be killed. Regularly. - Deployments - Autoscaling - Node drains - OOM kills - Spot/preemptible termination If your service cannot shut down gracefully, you will: - drop in-flight requests - create partial writes - break idempotency assumptions - amplify outages during deploys Graceful shutdown is not “nice to have”. It is required behavior.

Real Production Incident

Symptoms: - During deploy, error rate spikes for 2–5 minutes. - Clients retry aggressively, causing traffic surge. - Database sees a burst of duplicate writes. - Operators conclude “deployments are risky”. Root cause: - Pods were terminated while still serving traffic. - Load balancer stopped routing too late, or app kept accepting requests. - Background work was killed mid-flight. - Requests were not cancel-aware, so shutdown exceeded the termination window and got SIGKILL. This is not a Kubernetes problem. It is a shutdown design failure.

What “Graceful” Actually Means

A correct shutdown sequence looks like this: 1) Stop receiving new traffic (fail readiness / deregister) 2) Allow in-flight requests to complete (drain) 3) Cancel background work safely 4) Flush logs/metrics buffers if needed 5) Exit before termination timeout Production rule: If you cannot finish work safely within the termination window, you must design for interruption (idempotency, resumable jobs).

Symptom → Cause → Diagnosis → Fix

Symptom: - Spike in 499/502/503 during deploys - Increased retries and duplicates - Requests abruptly cut off Cause: - App keeps accepting requests after termination begins - No cancellation propagation - Long-running requests without deadlines - Background services ignoring CancellationToken Diagnosis: - Correlate deploy timestamp with error spikes. - Inspect ingress/load balancer logs for client disconnects. - Check pod termination events and termination grace period. - Confirm readiness behavior during shutdown. Fix: - Implement readiness-driven draining. - Ensure request handlers respect HttpContext.RequestAborted. - Wire CancellationToken through all long operations. - Make background jobs cancel-aware and resumable.

Anti-Pattern: Fire-and-Forget Work in Request Path

This is a reliability trap:

app.MapPost("/process", async (RequestDto dto, ILogger<Program> logger) =>
{
    _ = Task.Run(() => DoWork(dto));
    return Results.Accepted();
});

What happens in production: - Work can outlive the request scope. - Work ignores shutdown cancellation. - During deploy, tasks are killed mid-flight with no recovery. - You get partial effects and inconsistent state. If it matters, make it durable (queue/outbox) or finish it before responding.

Correct Pattern: Cancellation-Aware Request Handling

Use HttpContext.RequestAborted and propagate it. Minimal API example:

app.MapGet("/heavy", async (HttpContext ctx, SomeService svc) =>
{
    await svc.DoHeavyWorkAsync(ctx.RequestAborted);
    return Results.Ok();
});

Service code:

public sealed class SomeService
{
    public async Task DoHeavyWorkAsync(CancellationToken ct)
    {
        // Example: downstream call with cancellation
        await Task.Delay(TimeSpan.FromSeconds(2), ct);
    }
}

Production rule: Every long operation must accept a CancellationToken.

Stop Accepting Traffic Before Killing the Process

Graceful shutdown starts at the load balancer / readiness layer. If you use Kubernetes: - readiness probe controls routing - terminationGracePeriodSeconds controls shutdown window App responsibility: - become NotReady quickly when shutdown begins - stop accepting new work A common approach is to fail readiness when ApplicationStopping triggers (conceptually). The exact wiring depends on your health check implementation, but the principle is fixed: Readiness should flip to unhealthy during shutdown so the platform drains you.

Background Services Must Respect Cancellation

Anti-pattern: ignoring the stopping token.

public sealed class BadWorker : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (true)
        {
            await DoWorkAsync();
        }
    }
}

Correct pattern:

public sealed class GoodWorker : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            await DoWorkAsync(stoppingToken);
        }
    }

    private static async Task DoWorkAsync(CancellationToken ct)
    {
        await Task.Delay(TimeSpan.FromSeconds(1), ct);
    }
}

Production rule: Every loop must have a cancellation exit. Every wait must be cancelable.

Termination Windows and “Hanging Shutdown”

If shutdown takes too long: - platform sends SIGKILL - in-flight work is cut - you lose logs/metrics buffers You need: - request deadlines/timeouts - bounded background work - idempotency for unfinished tasks If you require more time, increase termination grace period carefully, but do not use it as a crutch for broken cancellation logic.

Operational Notes

Monitoring: - Track deploy-time error spikes (a sign of bad draining). - Track request aborts / client disconnects. - Track shutdown duration if you log it. Rollout strategy: - Canary deploy and watch for increased 5xx and aborted requests. - Validate that pods stop receiving traffic quickly after termination begins. Rollback: - If deploy introduces shutdown regression (errors spike only on deploy), rollback immediately. - Graceful shutdown regressions are repeatable and will hurt every deploy. Risk management: - For non-idempotent endpoints, consider idempotency keys. - For background work, use durable queues/outbox and resumable processing.

Checklist

- Readiness flips to NotReady during shutdown to stop new traffic. - In-flight requests can drain within termination window. - Request handlers propagate HttpContext.RequestAborted. - Long operations accept CancellationToken. - Background services respect stoppingToken and exit cleanly. - No fire-and-forget critical work in request path. - Deploy-time error spikes are monitored and alertable. - Idempotency/resumability exists for work that can be interrupted.

← Health Checks: Readiness vs Liveness