cgroups Basics (CPU/Memory Isolation)
What Are cgroups?
Control Groups (cgroups) are a Linux kernel feature that limit and isolate resource usage of process groups. They control:
- CPU usage
- Memory usage
- I/O bandwidth
- Number of processes
Containers (Docker, Kubernetes) rely on cgroups. But cgroups exist even without containers. Production Linux uses them via systemd.
Why cgroups Matter in Production
Without isolation:
- A memory leak can crash the entire host
- A batch job can starve your API
- A runaway process can fork bomb the system
With cgroups:
- Each service gets defined resource boundaries
- Blast radius is reduced
- Incidents are contained
cgroups v1 vs v2
Modern Linux distributions use cgroups v2 (unified hierarchy). You can check:
mount | grep cgroup
If you see cgroup2, you are using v2.
How systemd Uses cgroups
Every systemd service runs inside a cgroup. You can inspect:
systemctl status myapp systemd-cgls
systemd automatically creates slices and scopes.
Limiting CPU with systemd
Inside your unit file:
[Service] CPUQuota=50%
This means the service can only use up to 50% of one CPU.
Limiting Memory
[Service] MemoryMax=500M
If the process exceeds this limit, the kernel can kill it. Better one service dies than the entire host.
Live Inspection
Inspect cgroup paths:
cat /proc/<pid>/cgroup
Check memory usage (v2 example):
cat /sys/fs/cgroup/<path>/memory.current
cgroups vs nice
- nice influences scheduling priority
- cgroups enforce hard limits
nice is advisory. cgroups are enforcement.
Fork Bomb Protection
You can limit process count:
[Service] TasksMax=500
This prevents uncontrolled process spawning.
Production Pattern: Isolate Everything
Good production design:
- Each service runs in its own cgroup
- CPUQuota defined for non-critical services
- MemoryMax defined for untrusted workloads
- TasksMax set to safe values
Common Production Mistakes
- Running everything without limits
- Blaming code when the issue is resource starvation
- Using nice instead of proper cgroup limits
- Not monitoring memory.current or pressure
Mental Model
cgroups are containment. They define how much damage a service can do to a host. In production, isolation is more important than raw performance.
Production Checklist
- Use systemd limits (CPUQuota, MemoryMax, TasksMax)
- Inspect cgroup assignments during incidents
- Prefer isolation over shared unlimited resources
- Monitor memory and CPU pressure per service