System Introspection Toolkit

Know your host fast: uname, lsb_release, df, free, lscpu, lsblk, and friends.

On this page

Why Fast System Introspection Matters

In production, speed matters — but not the “random commands” kind. You need a repeatable, reliable checklist to understand a host quickly: what it is, how it is configured, where it is failing, and what is about to fail.

The 60-Second Host Snapshot

If you can only run a few commands, run these:

uname -a
uptime
df -h
free -h
ps aux --sort=-%cpu | head
ss -lntp | head

This tells you:

Kernel and OS basics
Load and runtime duration
Disk pressure
Memory pressure
Top CPU consumers
What is listening on ports

OS and Distribution

Identify distro and version clearly (useful for package decisions):

cat /etc/os-release
lsb_release -a

CPU, Memory, and Hardware

lscpu
nproc
free -h
vmstat 1 5

What to look for:

CPU cores vs load average
Memory available vs used
Swap usage trends
Context switching and run queue pressure

Storage and Filesystem Reality

lsblk
df -h
mount | head -n 30

Production checks:

Is /var filling up?
Are mounts correct?
Any unexpected tmpfs usage?

Disk Pressure and “Deleted But Still Open”

If df says full but you cannot find large files, check:

lsof | grep deleted | head

Common production issue: logs deleted while process still holds file handle.

Networking Snapshot

ip a
ip route
resolvectl status
ss -lntp

What to confirm:

Correct IP and routes
DNS resolver health
Listening services match expectation

System Limits and Kernel Signals

ulimit -a
sysctl -a | head
dmesg | tail -n 50

dmesg is where you often see:

OOM killer events
Disk I/O errors
Kernel warnings

Service Inventory (What Is Running?)

systemctl --type=service --state=running
systemctl status myapp --no-pager

Production engineers keep an inventory mindset: what should be running vs what is running.

Common Production Mistakes

Running random commands without a mental model
Ignoring disk and memory signals until outage
Not correlating load with CPU cores
Forgetting to check dmesg during weird failures

Mental Model

You are building situational awareness. Every command you run should answer one question: CPU? Memory? Disk? Network? Services? If you cannot state what you are checking, you are debugging blindly.

Production Checklist

60-second snapshot commands known by heart
Disk/memory/network inspected before deep dives
dmesg checked for kernel-level clues
systemctl inventory used to verify expectations

← Time, Timezones, and NTP