Sockets Debugging Tools in Linux Production
On this page
Why Socket Debugging Matters
Most production services communicate over TCP or UDP. When latency spikes or connections fail, the problem is often at the socket layer: connection backlog saturation, SYN floods, TIME-WAIT buildup, or file descriptor exhaustion.
Symptom
- Connection refused or connection reset errors
- Service reachable locally but not remotely
- High number of TIME-WAIT connections
- Intermittent timeouts under load
- Load balancer marks instance unhealthy
Root Cause
- Backlog queue full
- Too many concurrent connections
- File descriptor limits reached
- Network packet drops or retransmissions
- Firewall or NAT misconfiguration
Step 1: Inspect Listening Sockets
ss -ltnp
Check which process is bound to which port.
Step 2: Inspect Connection States
ss -tan
Common TCP states:
- LISTEN
- ESTABLISHED
- SYN-SENT
- SYN-RECV
- TIME-WAIT
- CLOSE-WAIT
Count Connections Per State
ss -tan | awk '{print $1}' | sort | uniq -c
Step 3: Detect Backlog Issues
ss -ltn
Compare Recv-Q and Send-Q values. High Recv-Q on a listening socket indicates backlog pressure.
Step 4: Identify Per-Process Connections
ss -tanp | grep PID
Or:
sudo lsof -iTCP -sTCP:ESTABLISHED
Step 5: Inspect Kernel Network Stats
netstat -s
Look for:
- TCP retransmissions
- Listen queue overflows
- Dropped packets
Step 6: Packet-Level Inspection
sudo tcpdump -ni any port 443
Useful for confirming whether packets arrive or are dropped upstream.
Common Production Failure Patterns
- SYN flood: many SYN-RECV entries
- TIME-WAIT storm: high connection churn
- CLOSE-WAIT accumulation: app not closing sockets properly
- Connection resets: backend crashing or overloaded
Mitigation
- Increase backlog size if justified
- Tune net.core.somaxconn
- Fix application connection handling
- Enable connection pooling
- Scale horizontally
- Adjust load balancer settings
Check somaxconn:
cat /proc/sys/net/core/somaxconn
Production Safety
- Investigate before tuning kernel parameters
- Avoid random sysctl changes without load testing
- Correlate socket states with application logs
- Check file descriptor limits alongside socket counts
Verification Checklist
ss -ltnp
ss -tan | awk '{print $1}' | sort | uniq -c
netstat -s | grep -i listen
- No excessive SYN-RECV buildup
- No abnormal CLOSE-WAIT accumulation
- Backlog not saturated
- Connection states consistent with traffic level
Why This Matters in Real Infrastructure
Many “application bugs” are actually socket-level issues. Understanding TCP states and connection patterns allows engineers to isolate network bottlenecks quickly and prevent cascading outages during traffic spikes.