utf-8 - UTF-8 可以编码多少个字符?

标签 utf-8 character-encoding ascii

如果UTF-8是8位,是不是意味着最多只能有256个不同的字符?

前 128 个代码点与 ASCII 中的相同。但它说UTF-8最多可以支持百万个字符?

这是如何工作的?

最佳答案

UTF-8并不总是使用1个字节,它是1到4个字节。

The first 128 characters (US-ASCII) need one byte.

The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.

Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use[12] including most Chinese, Japanese and Korean [CJK] characters.

Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

来源:Wikipedia

关于utf-8 - UTF-8 可以编码多少个字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10229156/

相关文章:

python - 在同一行制作 ASCII 艺术打印

javascript - 这段代码是怎么回事? (Javascript 解析)

python - Django 与 python 2.7 utf-8 问题

HTML 编码问题 - 显示 "Â"字符而不是 " "

php - 将用户输入转换为 UTF-8 的最佳方法

php - 无法在 mysql 数据库中正确插入希腊字符

vim - 在 Vim 与 Emacs 中编辑非 ASCII 文本

Java获取具有正确编码的url

ASP.NET - Server.HtmlEncode 将哪些字符编码为命名字符实体

unicode - EM Dash #151 有什么区别?和#8212;?