Metrics Design Principles
On this page
Actionable Metrics Principles
- Measure outcomes first (availability, latency, errors), then causes (CPU, queues).
- Prefer ratios and percentiles over raw counts when possible.
- Keep naming and labels consistent across services.
Baseline Metrics Set
- Requests: rps, error rate (4xx/5xx), latency p50/p95/p99
- Dependencies: timeouts, retries, saturation, pool usage
- Resources: CPU, memory, disk, network, GC (if relevant)
Export Hygiene
- Do not attach user_id, email, request_id as metric labels.
- Keep labels low-cardinality: region, endpoint group, status class.
Failure Modes
- High cardinality labels: storage blowups and query timeouts.
- Too many metrics: operators cannot see what matters.