java - 将 ucs-4 转换为 ucs-2

标签 java ucs2 ucs-4

ucs-4字符'🤣'的unicode值为0001f923,复制到java代码中时会自动更改为\uD83E\uDD23对应的值在 IntelliJ IDEA 中。

Java仅支持ucs-2,因此发生了从ucs-4到ucs-2的转换。

我想知道转换的逻辑,但没有找到任何相关 Material 。

最佳答案

https://en.wikipedia.org/wiki/UTF-16#U+010000_to_U+10FFFF

U+010000 to U+10FFFF

  • 0x10000 is subtracted from the code point (U), leaving a 20-bit number (U') in the range 0x00000–0xFFFFF. U is defined to be no greater than 0x10FFFF.
  • The high ten bits (in the range 0x000–0x3FF) are added to 0xD800 to give the first 16-bit code unit or high surrogate (W1), which will be in the range 0xD800–0xDBFF.
  • The low ten bits (also in the range 0x000–0x3FF) are added to 0xDC00 to give the second 16-bit code unit or low surrogate (W2), which will be in the range 0xDC00–0xDFFF.

现在输入代码点\U1F923:

  • \U1F923 -\U10000 =\UF923
  • \UF923 = 1111100100100011 = 00001111100100100011 = [0000111110][0100100011] = [\U3E][\U123]
  • \UD800 +\U3E =\UD83E
  • \UDC00 +\U123 =\UDD23
  • 结果:\UD83E\UDD23

编程:

public static void main(String[] args) {
    int input = 0x1f923;
    int x = input - 0x10000;

    int highTenBits = x >> 10;
    int lowTenBits = x & ((1 << 10) - 1);

    int high = highTenBits + 0xd800;
    int low = lowTenBits + 0xdc00;

    System.out.println(String.format("[%x][%x]", high, low));
}

关于java - 将 ucs-4 转换为 ucs-2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57954090/

相关文章:

java - 选择框 API 已更改

java - 为什么 “gradle dependencies”不显示所有依赖项?

java - 使 jface 向导在点击按钮/链接时开始

java - Akka (java) 如何在 application.config 中提供自定义值,如 appId 和 key

mysql - spring mvc和mysql的UTF-16编码

python - 正则表达式匹配 '\uFFFF' 以上的所有 unicode 字符

c# - 处理对短信的异常回复

python - 通过 pyenv 将 Python 构建为 UCS-4