Reading and Writing Text Safely (Encoding)
On this page
Always Specify Encoding
Encoding mismatches cause production-only bugs. Be explicit about utf-8 and handle errors intentionally.
Safe Read/Write
from pathlib import Path
p = Path("notes.txt")
text = p.read_text(encoding="utf-8")
p.write_text(text + "
", encoding="utf-8")
Streaming Large Files
from pathlib import Path
p = Path("huge.log")
with p.open("r", encoding="utf-8", errors="replace") as f:
for line in f:
if "ERROR" in line:
print(line.rstrip("
"))
Newlines and Normalization
Normalize line endings when processing cross-platform inputs.
Operational Checklist
- Encoding is always specified on open/read/write.
- For large files, stream line-by-line (do not read all into memory).
- Choose an explicit error policy:
strict,replace, orignore.
Failure Modes
- UnicodeDecodeError: unexpected byte sequences in input.
- Memory blowup: reading multi-GB files at once.
- Invisible whitespace: trailing newlines and CRLF cause subtle mismatches.