Autoscaling: HPA, VPA, Cluster Autoscaler

Operate HPA/VPA/Cluster Autoscaler with clear guardrails to prevent scaling storms and capacity surprises.

On this page

Autoscaling Components

HPA: scales replicas based on metrics (CPU/custom).
VPA: adjusts requests (and sometimes limits) for containers.
Cluster Autoscaler: adds/removes nodes based on pending pods.

Operational Guardrails

Set maxReplicas and sane stabilization windows.
Watch for metric gaps (metrics-server/Prometheus outages).
Prevent scaling storms via rate limits and step policies.

Inspect HPA

kubectl -n <ns> get hpa
kubectl -n <ns> describe hpa <name> | sed -n '1,220p'
kubectl -n <ns> get events --sort-by=.lastTimestamp | tail -25

Detect Capacity Issues

kubectl get pods -A --field-selector=status.phase=Pending
kubectl -n <ns> describe pod <pod> | grep -n 'Events' -A60
kubectl get nodes

Failure Modes

HPA stuck: metrics missing or targets misconfigured.
Scale up → Pending: cluster has no capacity; CA not working.
Scale down too fast: flapping causes errors and cold caches.

← Disruption Budgets, Drains, and Maintenance

NetworkPolicy and Traffic Control →