HTML Charsets
HTML Encoding (Character Sets)
To display text correctly, a web browser must know which character set (encoding) the HTML document uses.
The character set is defined using the <meta charset> tag.
The HTML charset Attribute
The charset attribute specifies the character encoding for the HTML document.
<meta charset="UTF-8">
The HTML specification strongly recommends using UTF-8.
UTF-8 supports almost all characters and symbols used worldwide.
Why UTF-8?
UTF-8 is a Unicode-based encoding that allows text from nearly every written language.
It is the default and most widely used encoding on the modern web.
The ASCII Character Set
ASCII was the first character encoding standard used on the web.
It defines 128 basic characters, including:
- English letters (A–Z, a–z)
- Digits (0–9)
- Basic symbols (! $ + - ( ) @ < > . # ?)
The ANSI Character Set (Windows-1252)
ANSI, also known as Windows-1252, was the first character set used by Windows systems.
- Characters 0–127 are identical to ASCII
- Characters 128–159 include special symbols
- Characters 160–255 match UTF-8 and ISO-8859-1
<meta charset="Windows-1252">
The ISO-8859-1 Character Set
ISO-8859-1 was the default character set for HTML 4.
- Identical to ASCII for values 0–127
- Does not define characters 128–159
- Matches ANSI and UTF-8 for values 160–255
HTML 4 example:
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
HTML 5 example:
<meta charset="ISO-8859-1">
The UTF-8 Character Set
UTF-8 is backward-compatible with ASCII and extends far beyond it.
- 0–127: identical to ASCII
- 128–159: unused
- 160–255: compatible with ANSI and ISO-8859-1
- 256+: supports thousands of additional characters
<meta charset="UTF-8">
Examples of UTF-8 Characters
Basic Latin
ABCD abcd 0123 ?#$%
Latin Extended
Ā Ă Ą Ć Ĉ Ċ Ē Ĕ Ė Ę
Diacritical Marks
à á â ã è é ê ẽ ò ó ô õ
General Punctuation
‰ ‱ ⁒ ‼ ⁇ ⁈ ⁉
Super and Subscript
C⁰ Cⁱ C⁴ C⁵ C₆ C₇ C₈
Braille Characters
⠓⠑⠇⠇⠕ ⠺⠕⠗⠇⠙
Summary
Using UTF-8 ensures correct display of text, symbols, emojis, and international characters.
For modern websites, UTF-8 is the safest and recommended encoding choice.