Dashboards That Drive Action
On this page
Dashboards That Operators Use
- Start with a "service overview": traffic, errors, latency, saturation.
- Provide drill-down links: logs query, traces filter, runbook.
- Show deploy markers and config changes.
Core Panels
- RPS and error rate (stacked by status class)
- Latency percentiles (p50/p95/p99)
- Dependency latency/errors
- Resource saturation (CPU/memory/disk/net)
Checklist
- Can I answer "is the service healthy" in 10 seconds?
- Can I find "what changed" quickly?
- Can I pivot to logs/traces in one click?
Failure Modes
- Pretty dashboards with no decisions: no thresholds, no runbooks.
- Too many panels: slow load and cognitive overload.