INFRA-DEVOPS Contents

Distributed Tracing Basics

Use distributed tracing to find bottlenecks and failure points across services with consistent propagation.

On this page

Tracing: What Operators Need

  • Consistent propagation across services (trace_id, span_id).
  • Service map to spot hotspots and failing edges.
  • Ability to filter by endpoint, status, and latency.

Minimum Trace Annotations

  • HTTP method/path/status
  • Duration and error flag
  • Dependency spans (db, cache, external APIs)

Triage Workflow

  1. Pick a failing endpoint (from metrics).
  2. Find traces with high latency or errors.
  3. Identify the slowest span and the failing dependency.

Failure Modes

  • No propagation: every service starts a new trace.
  • Too much sampling on errors: missing the interesting traces.