java - 将 ucs-4 转换为 ucs-2

ucs-4字符'🤣'的unicode值为0001f923，复制到java代码中时会自动更改为\uD83E\uDD23对应的值在 IntelliJ IDEA 中。

Java仅支持ucs-2，因此发生了从ucs-4到ucs-2的转换。

我想知道转换的逻辑，但没有找到任何相关 Material 。

最佳答案

https://en.wikipedia.org/wiki/UTF-16#U+010000_to_U+10FFFF

U+010000 to U+10FFFF

0x10000 is subtracted from the code point (U), leaving a 20-bit number (U') in the range 0x00000–0xFFFFF. U is defined to be no greater than 0x10FFFF.

The high ten bits (in the range 0x000–0x3FF) are added to 0xD800 to give the first 16-bit code unit or high surrogate (W1), which will be in the range 0xD800–0xDBFF.

The low ten bits (also in the range 0x000–0x3FF) are added to 0xDC00 to give the second 16-bit code unit or low surrogate (W2), which will be in the range 0xDC00–0xDFFF.

现在输入代码点\U1F923:

\U1F923 -\U10000 =\UF923
\UF923 = 1111100100100011 = 00001111100100100011 = [0000111110][0100100011] = [\U3E][\U123]
\UD800 +\U3E =\UD83E
\UDC00 +\U123 =\UDD23
结果:\UD83E\UDD23

编程:

public static void main(String[] args) {
    int input = 0x1f923;
    int x = input - 0x10000;

    int highTenBits = x >> 10;
    int lowTenBits = x & ((1 << 10) - 1);

    int high = highTenBits + 0xd800;
    int low = lowTenBits + 0xdc00;

    System.out.println(String.format("[%x][%x]", high, low));
}

关于java - 将 ucs-4 转换为 ucs-2，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57954090/

java - 将 ucs-4 转换为 ucs-2

上一篇：java - java中的大 float 和 double 打印/保留不正确。此行为是由于有效位数所致吗？

下一篇：java - 正则表达式允许单个下划线和连字符不在开头或结尾