DOTNET Contents

Correlation IDs & Trace Context

Correlation is how you turn 500 microservices into one story. If you do not propagate TraceContext and correlation IDs end-to-end, you will waste hours during incidents and still not know which dependency caused the blast.

On this page

Production incident

A request fails somewhere in a chain of services. Service A logs a 500. Service B logs a timeout. Service C logs nothing useful. There is no shared trace id, and Request-Id is generated differently in each hop. Engineers grep logs for hours, guess the root cause, and ship blind mitigations. The incident lasts longer than it should because the system has no consistent correlation.

Symptoms

  • You cannot answer: “which downstream call caused this request to fail?”
  • Logs show different IDs per service; you cannot join them.
  • Retries generate new IDs, hiding the original cause.
  • Distributed traces are fragmented or missing across boundaries.

Root causes

  • TraceContext (W3C traceparent/tracestate) is not propagated consistently.
  • Custom correlation IDs are generated but not forwarded to outbound calls.
  • Reverse proxy/gateway overwrites or strips headers unexpectedly.
  • Inbound and outbound correlation not tied into logging scope.

Diagnosis

# Find correlation middleware and header usage
grep -R "traceparent\|tracestate\|X-Request-Id\|X-Correlation-Id" -n .
grep -R "Activity" -n .
grep -R "Use" -n . | grep -i "middleware"

Validate on the wire: pick one request and confirm that traceparent is present from edge to leaf and back. If any hop drops it, your trace breaks there.

Anti-pattern

  • Generating a new correlation id for every outbound call (breaks end-to-end).
  • Using a single header name inconsistently (Request-Id in one service, Correlation-Id in another).
  • Logging correlation IDs but not propagating them to dependencies.

Correct pattern

Use W3C TraceContext as the primary correlation mechanism. Optionally add a human-friendly request id, but bind it to the same request scope and propagate it consistently.

Inbound: ensure a request id exists

// Minimal middleware: keep existing header if present, else create
app.Use(async (ctx, next) =>
{
    const string Header = "X-Request-Id";

    if (!ctx.Request.Headers.TryGetValue(Header, out var rid) || string.IsNullOrWhiteSpace(rid))
        ctx.Request.Headers[Header] = Guid.NewGuid().ToString("N");

    ctx.Response.Headers[Header] = ctx.Request.Headers[Header];

    using (Serilog.Context.LogContext.PushProperty("request_id", ctx.Request.Headers[Header].ToString()))
    {
        await next();
    }
});

Outbound: propagate correlation

  • HttpClient should propagate traceparent automatically when OpenTelemetry/Activity is wired.
  • For custom Request-Id, add it via a DelegatingHandler, not per call.
public sealed class RequestIdHandler : DelegatingHandler
{
    private readonly IHttpContextAccessor _ctx;

    public RequestIdHandler(IHttpContextAccessor ctx) => _ctx = ctx;

    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken ct)
    {
        var rid = _ctx.HttpContext?.Request.Headers["X-Request-Id"].ToString();
        if (!string.IsNullOrWhiteSpace(rid))
            request.Headers.TryAddWithoutValidation("X-Request-Id", rid);

        return base.SendAsync(request, ct);
    }
}

// Registration
services.AddHttpContextAccessor();
services.AddTransient<RequestIdHandler>();

services.AddHttpClient("Upstream")
    .AddHttpMessageHandler<RequestIdHandler>();

TraceContext rules

  • Do not invent your own tracing headers unless you have a strong reason.
  • Allow edge proxies to generate traceparent if missing, but do not rewrite it mid-flight.
  • Make sure gateways/proxies do not strip trace headers.

Security and performance impact

  • Performance: correlation reduces MTTR dramatically. It also enables sampling and targeted debugging rather than logging everything.
  • Security: correlation IDs must not contain PII. Treat them as identifiers that might be exposed to clients.

Operational notes

  • Monitoring: trace coverage rate (% of requests with trace_id), correlation header propagation failures, and log/trace join success.
  • Rollout: start at the edge/gateway and then services. One missing hop breaks the chain.
  • Rollback: if correlation breaks clients due to header policies, keep it server-internal and only echo safe IDs.

Checklist

  • W3C traceparent is propagated end-to-end.
  • request_id is stable per inbound request and echoed back.
  • Correlation is injected into log scope automatically.
  • Proxies/gateways preserve tracing headers.
  • Correlation IDs contain no PII and are safe to expose.