LINUX-PRODUCTION Contents

ps/top/htop: Finding What Hurts

Quickly identify hot processes, CPU steal, memory pressure, and runaway tasks.

On this page

Why ps/top/htop Are Production Tools (Not Just Commands)

In production, you do not debug by guessing. You debug by observing. ps/top/htop are the fastest way to answer:

  • Which process is hurting the system?
  • Is it CPU, memory, I/O, or just “looks busy”?
  • Is the problem one process, many processes, or the host itself?

Start With a Snapshot: ps

ps is your “frozen frame”. It is ideal for quick, scriptable checks.

Top CPU Consumers

ps aux --sort=-%cpu | head -n 15

Top Memory Consumers

ps aux --sort=-%mem | head -n 15

Production hint: %MEM is not enough by itself. Also watch RSS (resident memory), because VIRT can be huge without real usage.

Understand VIRT vs RES vs SHR

  • VIRT: virtual address space (can be large, not always a problem)
  • RES (RSS): real RAM used (this hurts)
  • SHR: shared memory (libraries shared between processes)

If RES grows steadily, suspect memory leak or cache growth. If VIRT is huge but RES stable, it may be normal (mmap, shared libs, etc.).

Live View: top

top is your “moving picture”. Use it when load is changing and you need trends. Start top:

top

Critical top Metrics

  • load average: 1m / 5m / 15m pressure trend
  • %Cpu: user vs system vs iowait
  • Mem: used vs available
  • Swap: activity is usually a bad sign in production

CPU Steal (Virtual Machines)

If you run on a VM, watch for “steal” time. High steal means the hypervisor is starving your VM of CPU, even if your app is fine. In top, look for st (steal).

Symptom:

  • App is slow
  • CPU seems high or inconsistent
  • Nothing obvious in process list

If steal is high, the fix may be infrastructure-level (bigger instance / less noisy neighbors), not code.

I/O Wait: The Silent Killer

If %Cpu shows high wa (iowait), the machine is waiting on disk. This is NOT a “CPU problem”. Restarting apps won’t fix slow storage.

Quick signal:

  • High load average
  • Many processes in D state
  • wa is high

Readability Upgrade: htop

htop is often easier for production work because:

  • Tree view (parent/child)
  • Interactive sorting
  • Per-core CPU view
  • Kill/renice actions with less typing

Install:

sudo apt install htop
sudo dnf install htop

Threads vs Processes Confusion

Some applications use many threads. CPU usage can look confusing. In top you can enable thread view:

top -H

Production use case:

  • Java apps
  • Some Node native modules
  • Databases

When ps/top Are Not Enough: pidstat

If you need per-process CPU, memory, and I/O over time, use pidstat (from sysstat).

pidstat -u 1
pidstat -r 1
pidstat -d 1

This helps answer: is the process consistently heavy, or only spiking?

Common Production Mistakes

  • Confusing high load with high CPU (load can be I/O wait)
  • Chasing VIRT instead of RES
  • Ignoring steal time on VMs
  • Killing processes without checking why they are busy
  • Restarting apps when the real bottleneck is disk/network

Mental Model

Always classify the pain before acting:

  • CPU-bound: high user/system CPU, low iowait
  • I/O-bound: high iowait, D-state processes, high load
  • Memory pressure: low available memory, swap activity, OOM events
  • Infrastructure starvation: high steal, noisy neighbor signals

Once you classify correctly, your next tool choice becomes obvious.

Production Checklist

  • Use ps for snapshots, top/htop for trends
  • Interpret load together with CPU (especially iowait)
  • Track RES for real memory usage
  • Watch steal time on VMs
  • Use pidstat when you need time-series behavior