utf-8 - UTF-8 可以编码多少个字符?

标签 utf-8 character-encoding ascii

如果UTF-8是8位,是不是意味着最多只能有256个不同的字符?

前 128 个代码点与 ASCII 中的相同。但它说UTF-8最多可以支持百万个字符?

这是如何工作的?

最佳答案

UTF-8并不总是使用1个字节,它是1到4个字节。

The first 128 characters (US-ASCII) need one byte.

The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.

Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use[12] including most Chinese, Japanese and Korean [CJK] characters.

Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

来源:Wikipedia

关于utf-8 - UTF-8 可以编码多少个字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10229156/

相关文章:

python - 将字节解码为 un​​icode 字符串

java - 如何对 CloseableHttpClient 请求进行 UTF-8 编码

mysql - 可以在 mysql 中区分重音和不区分大小写的 utf8 排序规则吗?

Mysql数据迁移-wbcopytables charset

html - Copyleft 符号

将包含大写字母的字符串转换为小写字母

sql-server - Url Decode T-SQL 函数未翻译超出 ascii 范围的字符

google-app-engine - UTF-8 字符串被 GAE 上的 ReSTLet 扰乱

unicode - UTF-8 中的所有汉字字符都是 3 字节长吗?

javascript - Node.js 服务器 : Image Upload/Corruption Issues