PYTHON Contents

Metrics Basics (What to Measure)

Use metrics to quantify reliability: track rates, errors, latency, and saturation so you can alert on symptoms, not guesses.

On this page

Metrics Answer "How Is It Doing?"

Metrics provide time-series signals for alerting and capacity planning. Focus on a small set that maps to user impact.

Core Metric Types

  • Counters: requests_total, errors_total
  • Gauges: queue_depth, memory_bytes
  • Histograms/Summaries: latency distributions

Minimal Metric Plan

# Conceptual metrics:
# requests_total{route, status}
# request_latency_ms{route} histogram
# errors_total{type}
# queue_depth gauge

Operational Checklist

  • Define SLO-aligned metrics (latency, error rate).
  • Use labels sparingly; avoid high-cardinality labels (user_id).
  • Alert on symptoms (error rate, latency) not causes (CPU alone).

Failure Modes

  • High cardinality: metrics backends melt under too many label values.
  • Vanity metrics: numbers that do not reflect user impact.