我需要用代码点和换行符做一些事情。我有一个函数接受 char
的代码点,如果它是 \r
,它需要有不同的行为。我有这个:
if (codePoint == Character.codePointAt(new char[] {'\r'}, 0)) {
但这非常丑陋,而且肯定不是正确的做法。这样做的正确方法是什么?
(我知道我可以对数字 13
(\r
的十进制标识符)进行硬编码并使用它,但这样做会使我不清楚我在做什么我正在做...)
最佳答案
如果您知道您的所有输入都将在基本多语言平面(U+0000 到 U+FFFF)中,那么您可以使用:
char character = 'x';
int codePoint = character;
它使用从 char
到 int
的隐式转换,如 JLS 5.1.2 中所指定:
19 specific conversions on primitive types are called the widening primitive conversions:
- ...
char
toint
,long
,float
, ordouble
...
A widening conversion of a char to an integral type T zero-extends the representation of the char value to fill the wider format.
但是,char
只是一个 UTF-16 编码单位。 Character.codePointAt
的要点在于它处理 BMP 之外的代码点,这些代码点由代理对组成 - 两个 UTF-16 代码单元连接在一起构成一个字符。
来自 JLS 3.1 :
The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.
如果您需要能够处理更复杂的情况,您将需要更复杂的代码。
关于java - 获取字符代码点的正确方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25826859/