JSON Encoding Pitfalls

UTF-8, escaping, control chars, and broken payloads in the wild.

On this page

Encoding problems are invisible until they break clients

Most JSON “bugs” in production are not about syntax. They are about encoding, escaping, and hidden characters. Your API should be strict about UTF-8 and consistent about escaping.

Use UTF-8 everywhere

Store text as UTF-8 in the database.
Output UTF-8 from the server.
Return JSON with the correct Content-Type.

Control characters and invalid JSON

Unescaped control characters (like raw newlines inside JSON strings) can break parsing. Always serialize JSON with a real encoder and never build JSON strings manually.

Example: a classic broken JSON string

{
  "note": "Line1
Line2"
}

Example: correct JSON string with escaped newline

{
  "note": "Line1\nLine2"
}

HTML vs JSON escaping

Do not HTML-escape JSON. A common bug is returning entities like " inside JSON text fields. JSON needs JSON escaping, not HTML escaping.

Unicode edge cases

Normalize input if you compare strings (especially for search).
Be careful with visually similar characters (look-alikes) in identifiers.

JSON injection concerns

Never concatenate untrusted strings into JSON output. Always use proper JSON encoding. For logs, store raw JSON separately or escape safely to avoid log injection.

Common mistakes

Building JSON manually with string concatenation
Mixing HTML escaping with JSON serialization
Wrong charset output causing mojibake (broken characters)

Checklist

Server always outputs UTF-8 JSON.
JSON is serialized by a real encoder, not manual string building.
No HTML-escaped entities appear inside JSON payloads.

← JSON Parse & Stringify

JSON Schema Validation →