Distributed Tracing Basics

Use distributed tracing to find bottlenecks and failure points across services with consistent propagation.

On this page

Tracing: What Operators Need

Consistent propagation across services (trace_id, span_id).
Service map to spot hotspots and failing edges.
Ability to filter by endpoint, status, and latency.

Minimum Trace Annotations

HTTP method/path/status
Duration and error flag
Dependency spans (db, cache, external APIs)

Triage Workflow

Pick a failing endpoint (from metrics).
Find traces with high latency or errors.
Identify the slowest span and the failing dependency.

Failure Modes

No propagation: every service starts a new trace.
Too much sampling on errors: missing the interesting traces.

← Alerting Strategy and Alert Fatigue

Sampling and High Cardinality Problems →