Filesystems (ext4/xfs) Basics
Filesystems in Production (ext4 vs XFS)
In production, filesystems do not fail politely. They remount read-only, corrupt metadata, refuse writes, or lie about available space.
You do not need academic filesystem theory. You need to understand:
- How ext4 and XFS behave under crash
- How journaling affects recovery
- What to do when the system remounts read-only
- Why “No space left on device” can happen even when df looks fine
ext4 vs XFS – Production Differences
ext4
- Very common default filesystem
- Uses journaling (metadata journal)
- Supports fsck (offline repair)
- Safer for small/medium disks
XFS
- Designed for large-scale systems
- High parallel I/O performance
- Cannot shrink easily
- Uses xfs_repair (fsck.xfs does NOT repair)
Production rule: ext4 is conservative and predictable. XFS scales better under heavy write workloads.
Scenario 1 — Filesystem Remounted Read-Only
Symptoms
- Application logs show write failures
- System errors: "Read-only file system"
- Container crashes
Diagnosis
dmesg -T | grep -i "read-only"
mount | grep "(ro,"
If filesystem is mounted as (ro), the kernel forced it read-only due to detected corruption or I/O failure.
Why It Happens
- Disk I/O errors
- Metadata corruption
- Power loss during write
- Storage backend instability
Fix Strategy
Step 1: Do NOT blindly remount rw.
sudo mount -o remount,rw /
If corruption exists, this may worsen damage.
Step 2: Check filesystem type:
df -T
If ext4:
Unmount (if possible) and run fsck:
sudo umount /dev/sdX1 sudo fsck -f /dev/sdX1
If XFS:
XFS cannot be repaired while mounted.
sudo umount /dev/sdX1 sudo xfs_repair /dev/sdX1
Important: fsck.xfs does not repair. It only checks.
Scenario 2 — "No Space Left on Device" but df Shows Free Space
Symptoms
- Error: ENOSPC
- df -h shows free disk space
Common Causes
- Inodes exhausted
- Reserved blocks (ext4)
- Open deleted files still consuming space
Diagnosis
df -i
If IUse% is 100%, inodes are exhausted.
Check for deleted open files:
sudo lsof | grep deleted
If large files are deleted but still open by a process, disk space is not freed until process restarts.
Scenario 3 — Journal Replay After Crash
After sudden reboot, ext4 may replay journal automatically.
dmesg -T | grep -i "recovery"
ext4 journal recovery is automatic and usually safe.
XFS also performs log recovery:
dmesg -T | grep -i xfs
If journal replay fails, manual repair may be required.
Scenario 4 — Filesystem Corruption on Large XFS Volume
XFS corruption symptoms:
- Metadata errors in dmesg
- Cannot create files
- Directory listing fails
dmesg -T | grep -i "xfs"
Repair process:
sudo umount /data sudo xfs_repair -v /dev/sdX1
Never run xfs_repair on mounted filesystem.
Reserved Blocks in ext4
ext4 reserves ~5% space for root by default.
sudo tune2fs -l /dev/sdX1 | grep "Reserved block"
Adjust (carefully) if needed:
sudo tune2fs -m 1 /dev/sdX1
Reducing reserved space increases usable capacity but reduces safety margin.
Mental Model
- Filesystem is not just storage — it is metadata + allocation logic
- ext4 is stable and conservative
- XFS is scalable and parallel-friendly
- Read-only remount is a protection mechanism
- Journal replay protects metadata consistency
Common Production Mistakes
- Running fsck on mounted filesystem
- Using fsck on XFS instead of xfs_repair
- Ignoring dmesg I/O errors
- Remounting rw without root cause analysis
- Forgetting inode limits
- Not checking deleted-but-open files
Production Checklist
- Identify filesystem type with df -T
- Check kernel logs before any repair
- Unmount before running repair tools
- Check inode usage (df -i)
- Check open deleted files (lsof)
- Validate storage backend health
- Test write operations after repair