PYTHON Contents

Reading and Writing Text Safely (Encoding)

Read and write text safely using explicit encodings, newline handling, and streaming patterns for large files.

On this page

Always Specify Encoding

Encoding mismatches cause production-only bugs. Be explicit about utf-8 and handle errors intentionally.

Safe Read/Write

from pathlib import Path

p = Path("notes.txt")
text = p.read_text(encoding="utf-8")

p.write_text(text + "
", encoding="utf-8")

Streaming Large Files

from pathlib import Path

p = Path("huge.log")
with p.open("r", encoding="utf-8", errors="replace") as f:
    for line in f:
        if "ERROR" in line:
            print(line.rstrip("
"))

Newlines and Normalization

Normalize line endings when processing cross-platform inputs.

Operational Checklist

  • Encoding is always specified on open/read/write.
  • For large files, stream line-by-line (do not read all into memory).
  • Choose an explicit error policy: strict, replace, or ignore.

Failure Modes

  • UnicodeDecodeError: unexpected byte sequences in input.
  • Memory blowup: reading multi-GB files at once.
  • Invisible whitespace: trailing newlines and CRLF cause subtle mismatches.