UTF-8

UTF-8 is a <a href="/facts/Character_encoding/ShJHIMoA">character encoding</a> standard used for electronic communication. Defined by the <a href="/facts/Unicode/DDu3MfPV">Unicode</a> Standard, the name is derived from Unicode Transformation Format – 8-bit. Almost every webpage is in UTF-8.
UTF-8 supports all 1,112,064 valid Unicode <a href="/facts/Code_point/R184ugJX">code points</a> using a <a href="/facts/Variable-width_encoding/2LygbAV6">variable-width encoding</a> of one to four one-<a href="/facts/Byte/BNHUp9Qq">byte</a> (8-bit) code units.
Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for <a href="/facts/Backward_compatibility/FbHFBu5y">backward compatibility</a> with <a href="/facts/ASCII/vGUI33Qu">ASCII</a>: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any <a href="/facts/Extended_ASCII/Agfdo1R0">extended ASCII</a> can read and write UTF-8, and this results in fewer internationalization issues than any alternative text encoding.
UTF-8 is dominant for all countries/languages on the internet, with 99% global average use, is used in most standards, often the only allowed encoding, and is supported by all modern operating systems and programming languages.

UTF-8 open-in-new

UTF-8