Metrics Basics (What to Measure)
On this page
Metrics Answer "How Is It Doing?"
Metrics provide time-series signals for alerting and capacity planning. Focus on a small set that maps to user impact.
Core Metric Types
- Counters: requests_total, errors_total
- Gauges: queue_depth, memory_bytes
- Histograms/Summaries: latency distributions
Minimal Metric Plan
# Conceptual metrics:
# requests_total{route, status}
# request_latency_ms{route} histogram
# errors_total{type}
# queue_depth gauge
Operational Checklist
- Define SLO-aligned metrics (latency, error rate).
- Use labels sparingly; avoid high-cardinality labels (user_id).
- Alert on symptoms (error rate, latency) not causes (CPU alone).
Failure Modes
- High cardinality: metrics backends melt under too many label values.
- Vanity metrics: numbers that do not reflect user impact.