java - 实现一个算法来确定字符串是否包含所有唯一字符(大于 U+FFFF 的字符)

我正在练习面试问题示例，其中之一是: “实现一种算法来确定字符串是否具有所有唯一字符”。

当我们假设它是 ASCII/ANSI 时，这很容易。 implement-an-algorithm-to-determine-if-a-string-has-all-unique-charact

但我的问题是:如果字符串可以包含例如，应该如何解决？象形文字符号或其他任何符号(代码点大于 U+FFFF...？)。

所以，如果我理解正确的话，如果给定的字符串包含属于从 U+0000 到 U+FFFF 的字符集的字符，我可以很容易地想到解决方案 - 它们可以转换为 16 位字符，但是如果我怎么办？遇到代码点大于 U+FFFF... 的字符？

Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF)

但我不知道在这种情况下如何解决这个难题，我该如何处理这些代理对？

谢谢!

最佳答案

Java 8 有一个 CharSequence#codePoints method生成字符串中 Unicode 代码点的 IntStream。从那里开始，只需编写代码来测试 IntStream 中元素的唯一性即可。

如果您仍在使用 Java 7 或更低版本，也可以使用基于代码点的方法来解决此问题，但它们使用起来要复杂得多。您必须循环遍历字符串的 char 并检查每个字符的值，以确定您是否正在处理代理项对。像这样的东西(完全未经测试):

for (int i = 0; i < str.length(); i++) {
    int codepoint = str.codePointAt(i++);
    if (Character.isHighSurrogate(str.charAt(i))) {
        // This will fail if the UTF-16 representation of 
        // this string is wrong (e.g., high surrogate `char` 
        // at the end of the string's `char[]`).
        i += 1;
    }
    // do stuff with codepoint...
}

关于java - 实现一个算法来确定字符串是否包含所有唯一字符(大于 U+FFFF 的字符)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36871838/

java - 实现一个算法来确定字符串是否包含所有唯一字符(大于 U+FFFF 的字符)

上一篇：java - 比较数组中的数字

下一篇：Java 8 : Merge two list of objects by id