Distributed Tracing Basics
On this page
Tracing: What Operators Need
- Consistent propagation across services (trace_id, span_id).
- Service map to spot hotspots and failing edges.
- Ability to filter by endpoint, status, and latency.
Minimum Trace Annotations
- HTTP method/path/status
- Duration and error flag
- Dependency spans (db, cache, external APIs)
Triage Workflow
- Pick a failing endpoint (from metrics).
- Find traces with high latency or errors.
- Identify the slowest span and the failing dependency.
Failure Modes
- No propagation: every service starts a new trace.
- Too much sampling on errors: missing the interesting traces.