如果UTF-8是8位,是不是意味着最多只能有256个不同的字符?
前 128 个代码点与 ASCII 中的相同。但它说UTF-8最多可以支持百万个字符?
这是如何工作的?
最佳答案
UTF-8并不总是使用1个字节,它是1到4个字节。
The first 128 characters (US-ASCII) need one byte.
The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.
Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use[12] including most Chinese, Japanese and Korean [CJK] characters.
Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).
来源:Wikipedia
关于utf-8 - UTF-8 可以编码多少个字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10229156/