Cluster Upgrades and Versioning Strategy
On this page
Upgrade Principles
- Upgrade in phases: control plane → nodes → add-ons.
- Confirm compatibility: CNI, CSI, Ingress, metrics, autoscalers.
- Have rollback path: snapshots/backups, version pinning, tested restore.
Pre-Upgrade Checklist
kubectl version --short kubectl get nodes -o wide kubectl -n kube-system get pods -o wide kubectl get apiservices | grep -v True || true kubectl get events -A --sort-by=.lastTimestamp | tail -30
Control Plane Health Gates
kubectl get --raw=/readyz?verbose kubectl get --raw=/livez?verbose
Node Upgrade Runbook (Generic)
- Cordon node
- Drain respecting PDBs
- Upgrade node components
- Uncordon and validate workloads
kubectl cordon <node> kubectl drain <node> --ignore-daemonsets --delete-emptydir-data --timeout=10m kubectl uncordon <node>
Failure Modes
- Addon mismatch (CNI/CSI/Ingress) → networking/storage breakage.
- PDBs too strict → drains stuck and upgrade stalls.