java - 获取字符代码点的正确方法是什么?

标签 java unicode char codepoint

我需要用代码点和换行符做一些事情。我有一个函数接受 char 的代码点,如果它是 \r,它需要有不同的行为。我有这个:

if (codePoint == Character.codePointAt(new char[] {'\r'}, 0)) {

但这非常丑陋,而且肯定不是正确的做法。这样做的正确方法是什么?

(我知道我可以对数字 13(\r 的十进制标识符)进行硬编码并使用它,但这样做会使我不清楚我在做什么我正在做...)

最佳答案

如果您知道您的所有输入都将在基本多语言平面(U+0000 到 U+FFFF)中,那么您可以使用:

char character = 'x';
int codePoint = character;

它使用从 charint 的隐式转换,如 JLS 5.1.2 中所指定:

19 specific conversions on primitive types are called the widening primitive conversions:

  • ...
  • char to int, long, float, or double

...

A widening conversion of a char to an integral type T zero-extends the representation of the char value to fill the wider format.

但是,char 只是一个 UTF-16 编码单位Character.codePointAt 的要点在于它处理 BMP 之外的代码点,这些代码点由代理对组成 - 两个 UTF-16 代码单元连接在一起构成一个字符。

来自 JLS 3.1 :

The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

如果您需要能够处理更复杂的情况,您将需要更复杂的代码。

关于java - 获取字符代码点的正确方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25826859/

相关文章:

Java:CopyOnWriteArrayList 与 synchronizedList

Java swing JTextField 在 BorderLayout.SOUTH 添加按钮后消失

java - Java (Android) 中的 Unicode 字符串不起作用

python - 在 Python 中将 Unicode 转换为 ASCII 而不会出错

java - Android 中的希伯来语短信

令人困惑的字符串交互

c++ - char& operator[] 重载引用返回? (链表)

java - 无法检查 Android 服务问题

java - 如何将动态创建的操作栏设置为底部?

c++ - 将 std::string 转换为 char *