INFRA-DEVOPS Contents

Metrics Design Principles

Design actionable metrics: what to measure, how to avoid noisy signals, and how to build reliable alert foundations.

On this page

Actionable Metrics Principles

  • Measure outcomes first (availability, latency, errors), then causes (CPU, queues).
  • Prefer ratios and percentiles over raw counts when possible.
  • Keep naming and labels consistent across services.

Baseline Metrics Set

  • Requests: rps, error rate (4xx/5xx), latency p50/p95/p99
  • Dependencies: timeouts, retries, saturation, pool usage
  • Resources: CPU, memory, disk, network, GC (if relevant)

Export Hygiene

  • Do not attach user_id, email, request_id as metric labels.
  • Keep labels low-cardinality: region, endpoint group, status class.

Failure Modes

  • High cardinality labels: storage blowups and query timeouts.
  • Too many metrics: operators cannot see what matters.