LINUX-PRODUCTION Contents

Sockets Debugging Tools in Linux Production

Debug TCP/UDP socket issues using ss, netstat, lsof, tcpdump, and connection state analysis. Identify SYN floods, TIME-WAIT buildup, backlog exhaustion, and hidden network bottlenecks.

On this page

Why Socket Debugging Matters

Most production services communicate over TCP or UDP. When latency spikes or connections fail, the problem is often at the socket layer: connection backlog saturation, SYN floods, TIME-WAIT buildup, or file descriptor exhaustion.

Symptom

  • Connection refused or connection reset errors
  • Service reachable locally but not remotely
  • High number of TIME-WAIT connections
  • Intermittent timeouts under load
  • Load balancer marks instance unhealthy

Root Cause

  • Backlog queue full
  • Too many concurrent connections
  • File descriptor limits reached
  • Network packet drops or retransmissions
  • Firewall or NAT misconfiguration

Step 1: Inspect Listening Sockets

ss -ltnp

Check which process is bound to which port.

Step 2: Inspect Connection States

ss -tan

Common TCP states:

  • LISTEN
  • ESTABLISHED
  • SYN-SENT
  • SYN-RECV
  • TIME-WAIT
  • CLOSE-WAIT

Count Connections Per State

ss -tan | awk '{print $1}' | sort | uniq -c

Step 3: Detect Backlog Issues

ss -ltn

Compare Recv-Q and Send-Q values. High Recv-Q on a listening socket indicates backlog pressure.

Step 4: Identify Per-Process Connections

ss -tanp | grep PID

Or:

sudo lsof -iTCP -sTCP:ESTABLISHED

Step 5: Inspect Kernel Network Stats

netstat -s

Look for:

  • TCP retransmissions
  • Listen queue overflows
  • Dropped packets

Step 6: Packet-Level Inspection

sudo tcpdump -ni any port 443

Useful for confirming whether packets arrive or are dropped upstream.

Common Production Failure Patterns

  • SYN flood: many SYN-RECV entries
  • TIME-WAIT storm: high connection churn
  • CLOSE-WAIT accumulation: app not closing sockets properly
  • Connection resets: backend crashing or overloaded

Mitigation

  • Increase backlog size if justified
  • Tune net.core.somaxconn
  • Fix application connection handling
  • Enable connection pooling
  • Scale horizontally
  • Adjust load balancer settings

Check somaxconn:

cat /proc/sys/net/core/somaxconn

Production Safety

  • Investigate before tuning kernel parameters
  • Avoid random sysctl changes without load testing
  • Correlate socket states with application logs
  • Check file descriptor limits alongside socket counts

Verification Checklist

ss -ltnp
ss -tan | awk '{print $1}' | sort | uniq -c
netstat -s | grep -i listen
  • No excessive SYN-RECV buildup
  • No abnormal CLOSE-WAIT accumulation
  • Backlog not saturated
  • Connection states consistent with traffic level

Why This Matters in Real Infrastructure

Many “application bugs” are actually socket-level issues. Understanding TCP states and connection patterns allows engineers to isolate network bottlenecks quickly and prevent cascading outages during traffic spikes.