INFRA-DEVOPS Contents

Alerting Strategy and Alert Fatigue

Build alerting that wakes humans only when necessary: routing, dedupe, severity, and actionable messages.

On this page

Alerting Goals

  • Wake humans only for actionable, time-sensitive issues.
  • Every alert must answer: what broke, where, impact, next step.
  • Route to the right owner with correct severity.

Severity Guidelines

  • SEV1: user-impacting outage, immediate response.
  • SEV2: partial outage/brownout, rapid response.
  • SEV3: degradation, investigate in business hours.

Alert Message Template

Title: API p95 latency > 800ms (prod, eu-west)
Impact: ~25% traffic affected, errors rising
Signals: p95=1.4s, 5xx=2.1%, upstream timeouts
Next: check dependency "payments" latency + recent deploy
Links: dashboard, logs query, runbook, last rollout

Failure Modes

  • Alert storms: no dedupe/silence strategy.
  • Non-actionable alerts: people ignore pages and miss real incidents.