java - Character可以代表所有的unicode码位吗?

标签 java unicode utf-16

由于 Java char 是 16 位长,我想知道它如何表示完整的 unicode 代码点? 它只能表示 65536 个代码点,对吗?

最佳答案

是的,Java 字符是一个 UTF-16 代码单元。如果您需要在 Basic Multilingual Plane 之外表示 Unicode 字符,则需要在 java.lang.String 中使用代理项对。 String 类提供了多种方法来处理完整的 Unicode 代码点,例如 codePointAt(index)

来自 section 3.1 of the Java Language Specification :

The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding. A few APIs, primarily in the Character class, use 32-bit integers to represent code points as individual entities. The Java platform provides methods to convert between the two representations.

参见 Character docs获取更多信息。

关于java - Character可以代表所有的unicode码位吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8768327/

相关文章:

iphone - 在韩语字素簇内搜索或比较

Java 数组绑定(bind)异常

javascript - 将表情符号作为短信中的文本发送

python-3.x - 在python3中它们同样存储在内存中吗?

c - WinApi unicode 问题

SQLite - 将特殊符号(商标,...)插入表中

c++ - utf-8 与 utf-16 之间的问题

java - ConcurrentHashMap 更改对所有线程可见?

java - Gradle 中的“提供”依赖项

java - 创建的字符串对象计数