java - 获取字符代码点的正确方法是什么？

我需要用代码点和换行符做一些事情。我有一个函数接受 char 的代码点，如果它是 \r，它需要有不同的行为。我有这个:

if (codePoint == Character.codePointAt(new char[] {'\r'}, 0)) {

但这非常丑陋，而且肯定不是正确的做法。这样做的正确方法是什么？

_{(我知道我可以对数字 13(\r 的十进制标识符)进行硬编码并使用它，但这样做会使我不清楚我在做什么我正在做...)}

最佳答案

如果您知道您的所有输入都将在基本多语言平面(U+0000 到 U+FFFF)中，那么您可以使用:

char character = 'x';
int codePoint = character;

它使用从 char 到 int 的隐式转换，如 JLS 5.1.2 中所指定:

19 specific conversions on primitive types are called the widening primitive conversions:

...

char to int, long, float, or double

...

A widening conversion of a char to an integral type T zero-extends the representation of the char value to fill the wider format.

但是，char 只是一个 UTF-16 编码单位。 Character.codePointAt 的要点在于它处理 BMP 之外的代码点，这些代码点由代理对组成 - 两个 UTF-16 代码单元连接在一起构成一个字符。

来自 JLS 3.1 :

The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

如果您需要能够处理更复杂的情况，您将需要更复杂的代码。

关于java - 获取字符代码点的正确方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25826859/

java - 获取字符代码点的正确方法是什么？

上一篇：java - Tomcat 7 中的 URLRewrite

下一篇：java - 递归 Pascal 的三角行大 O 成本