HTML Charsets

HTML character sets define how text and symbols are encoded and displayed in a web page. UTF-8 is the recommended charset because it supports almost all characters and symbols worldwide.

On this page

HTML Encoding (Character Sets)

To display text correctly, a web browser must know which character set (encoding) the HTML document uses.

The character set is defined using the <meta charset> tag.

The HTML charset Attribute

The charset attribute specifies the character encoding for the HTML document.

<meta charset="UTF-8">

The HTML specification strongly recommends using UTF-8.

UTF-8 supports almost all characters and symbols used worldwide.

Why UTF-8?

UTF-8 is a Unicode-based encoding that allows text from nearly every written language.

It is the default and most widely used encoding on the modern web.

The ASCII Character Set

ASCII was the first character encoding standard used on the web.

It defines 128 basic characters, including:

  • English letters (A–Z, a–z)
  • Digits (0–9)
  • Basic symbols (! $ + - ( ) @ < > . # ?)

The ANSI Character Set (Windows-1252)

ANSI, also known as Windows-1252, was the first character set used by Windows systems.

  • Characters 0–127 are identical to ASCII
  • Characters 128–159 include special symbols
  • Characters 160–255 match UTF-8 and ISO-8859-1
<meta charset="Windows-1252">

The ISO-8859-1 Character Set

ISO-8859-1 was the default character set for HTML 4.

  • Identical to ASCII for values 0–127
  • Does not define characters 128–159
  • Matches ANSI and UTF-8 for values 160–255

HTML 4 example:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

HTML 5 example:

<meta charset="ISO-8859-1">

The UTF-8 Character Set

UTF-8 is backward-compatible with ASCII and extends far beyond it.

  • 0–127: identical to ASCII
  • 128–159: unused
  • 160–255: compatible with ANSI and ISO-8859-1
  • 256+: supports thousands of additional characters
<meta charset="UTF-8">

Examples of UTF-8 Characters

Basic Latin

ABCD abcd 0123 ?#$%

Latin Extended

Ā Ă Ą Ć Ĉ Ċ Ē Ĕ Ė Ę

Diacritical Marks

à á â ã è é ê ẽ ò ó ô õ

General Punctuation

‰ ‱ ⁒ ‼ ⁇ ⁈ ⁉

Super and Subscript

C⁰ Cⁱ C⁴ C⁵ C₆ C₇ C₈

Braille Characters

⠓⠑⠇⠇⠕ ⠺⠕⠗⠇⠙

Summary

Using UTF-8 ensures correct display of text, symbols, emojis, and international characters.

For modern websites, UTF-8 is the safest and recommended encoding choice.

HTML Charsets Examples (4)