我试图理解 Java 中的一些 String 类函数。所以,这是一个简单的代码:
/* different experiments with String class */
public class TestStrings {
public static void main(String[] args) {
String greeting = "Hello\uD835\uDD6b";
System.out.println("Number of code units in greeting is " + greeting.length());
System.out.println("Number of code points " + greeting.codePointCount(0,greeting.length()));
int index = greeting.offsetByCodePoints(0,6);
System.out.println("index = " + index);
int cp = greeting.codePointAt(index);
System.out.println("Code point at index is " + (char) cp);
}
}
\uD835\uDD6b 是一个 ℤ 符号,所以可以作为代理对。
因此,该字符串有 6(六)个代码点和 7(七)个代码单元(2 字节字符)。正如文档中所述:
offsetByCodePoints
public int offsetByCodePoints(int index, int codePointOffset)
Returns the index within this String that is offset from the given index by codePointOffset code points. Unpaired surrogates within the text range given by index and codePointOffset count as one code point each.
Parameters:
index
- the index to be offset
codePointOffset
- the offset in code points
所以我们确实在代码点中给出了一个参数。但是,对于给定的参数 (0,6),它仍然可以正常工作,无一异常(exception)。但 codePointAt() 失败,因为它返回 7,这是越界的。那么,也许该函数以代码单元获取其参数?或者我错过了什么。
最佳答案
codePointAt
采用 char
索引。
The index refers to char values (Unicode code units) and ranges from
0
tolength() - 1
.
该字符串中有六个代码点。 offsetByCodePoints
调用返回 6 个代码点后的索引,即 char-index 7。然后您尝试获取字符串末尾的 codePointAt(7)
.
要了解原因,请考虑什么
"".offsetByCodePoints(0, 0) == 0
因为要数过所有 0 个代码点,您必须数过所有 0 个 char
。
将其外推到您的字符串,要计算超过所有 6
代码点,您必须计算超过所有 7 个 char
。
也许看到正在使用的 codePointAt
会清楚这一点。这是遍历字符串(或 CharSequence
)中所有代码点的惯用方法:
for (var charIndex = 0, nChars = s.length(), codepoint;
charIndex < nChars;
charIndex += Character.charCount(codepoint)) {
codepoint = s.codePointAt(charIndex);
// Do something with codepoint.
}
关于java - 什么 Java 函数 offsetByCodePoints 真正接受参数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8521226/