JSON Encoding Pitfalls

UTF-8, escaping, control chars, and broken payloads in the wild.

On this page

Encoding problems are invisible until they break clients

Most JSON “bugs” in production are not about syntax. They are about encoding, escaping, and hidden characters. Your API should be strict about UTF-8 and consistent about escaping.

Use UTF-8 everywhere

  • Store text as UTF-8 in the database.
  • Output UTF-8 from the server.
  • Return JSON with the correct Content-Type.

Control characters and invalid JSON

Unescaped control characters (like raw newlines inside JSON strings) can break parsing. Always serialize JSON with a real encoder and never build JSON strings manually.

Example: a classic broken JSON string

{
  "note": "Line1
Line2"
}

Example: correct JSON string with escaped newline

{
  "note": "Line1\nLine2"
}

HTML vs JSON escaping

Do not HTML-escape JSON. A common bug is returning entities like " inside JSON text fields. JSON needs JSON escaping, not HTML escaping.

Unicode edge cases

  • Normalize input if you compare strings (especially for search).
  • Be careful with visually similar characters (look-alikes) in identifiers.

JSON injection concerns

Never concatenate untrusted strings into JSON output. Always use proper JSON encoding. For logs, store raw JSON separately or escape safely to avoid log injection.

Common mistakes

  • Building JSON manually with string concatenation
  • Mixing HTML escaping with JSON serialization
  • Wrong charset output causing mojibake (broken characters)

Checklist

  • Server always outputs UTF-8 JSON.
  • JSON is serialized by a real encoder, not manual string building.
  • No HTML-escaped entities appear inside JSON payloads.