Virtualization vs Containers (Failure Domains)
On this page
Failure Domains (Ops View)
- VM: stronger isolation; host kernel issues less likely to affect all guests, but hypervisor is shared.
- Container: shares host kernel; misconfig or kernel bug can impact many workloads.
What Changes in Troubleshooting
- In containers: check cgroup limits, throttling, and filesystem layers.
- In VMs: check steal time, noisy neighbors on hypervisor, and IO virtualization.
Signals to Check
# virtualization hint top -b -n 1 | grep -i 'Cpu' || true # steal time appears in some tools; also check: vmstat 1 5
Operational Guidance
- Separate critical workloads into dedicated pools (nodes or VM groups).
- Use quotas/limits regardless of substrate.
- Document blast radius per layer: node, hypervisor, AZ, region.
Failure Modes
- Kernel-level issue: container fleet impacted simultaneously.
- Host disk saturation: many containers degrade together.
- Hypervisor contention: VM steal time increases, latency rises.