INFRA-DEVOPS Contents

Resource Metrics Primer (what to alert on)

Know what to alert on: CPU, memory, disk, and network signals that predict outages early.

On this page

Metrics That Predict Outages

  • CPU: utilization + run queue + throttling.
  • Memory: working set, swap in/out, OOM kills.
  • Disk: latency, %util, queue depth, free space, inode usage.
  • Network: retransmits, drops, RTT spikes.

On-Host Quick Checks

uptime
vmstat 1 5
free -h
df -h
df -i
iostat -xz 1 5 2>/dev/null || true

Alerting Rules of Thumb

  • Alert on symptoms (latency, errors) plus cause signals (saturation).
  • Avoid paging on noisy metrics; prefer multi-signal alerts.
  • Use burn-rate alerts for SLOs where possible.

Common Misreads

  • High CPU can be fine if latency is stable and no queueing.
  • Low 'free' memory is normal due to page cache.
  • High throughput can coexist with terrible latency.

Checklist

# CPU: utilization + run queue + throttling
# Mem: swap + OOM + major faults
# Disk: await/%util + free space + inodes
# Net: retransmits/drops + RTT