Objective-C如何获取unicode字符

我想获取 Objective-C 中给定 unicode 字符的 unicode 代码点。 NSString 表示它内部使用 UTF-16 编码并表示，

The NSString class has two primitive methods—length and characterAtIndex:—that provide the basis for all other methods in its interface. The length method returns the total number of Unicode characters in the string. characterAtIndex: gives access to each character in the string by index, with index values starting at 0.

这似乎假设 characterAtIndex 方法是 unicode 感知的。然而它返回的 unichar 是一个 16 位无符号整型。

- (unichar)characterAtIndex:(NSUInteger)index

问题是:

Q1:UFFFF以上的unicode码位是如何呈现的？
问题 2:如果问题 1 有意义，是否有方法可以在 Objective-C 中获取给定 unicode 字符的 unicode 代码点。

谢谢。

最佳答案

“问题 1:它如何在 UFFFF 之上呈现 unicode 代码点？”的简短回答是:你需要是UTF16了解并正确处理Surrogate Code Points .下面的信息和链接应该为您提供指导和示例代码，以帮助您执行此操作。

NSString文档是正确的。然而，当你说“NSString 说它内部使用 UTF-16 编码”时，更准确地说是 NSString 的公共(public)/抽象接口(interface)。是UTF16基于。不同之处在于，这将字符串的内部表示保留为私有(private)实现细节，但公共(public)方法如 characterAtIndex:和 length总是在 UTF16 .

这样做的原因是它倾向于在较旧的 ASCII 之间取得最佳平衡。 -centric 和 Unicode 感知字符串，主要是因为 Unicode 是 ASCII 的严格超集。 (ASCII 使用 7 位，用于 128 个字符，映射到前 128 个 Unicode 代码点)。

表示 Unicode Code Points那是 > U+FFFF ，这显然超出了单个 UTF16 中可以表示的范围Code Unit , UTF16 使用特殊 Surrogate Code Points形成一个Surrogate Pair , 组合在一起形成一个 Unicode 代码点 > U+FFFF .您可以在以下位置找到有关此内容的详细信息:

Unicode UTF 常见问题解答 - What are surrogates?
Unicode UTF 常见问题解答 - What’s the algorithm to convert from UTF-16 to character codes?
尽管官方 Unicode UTF 常见问题解答 - How do I write a UTF converter?现在建议使用 International Components for Unicode ，它曾经推荐一些由 Unicode 官方认可和维护的代码。虽然不再直接从 Unicode.org 获得，但您仍然可以在各种开源项目中找到“不再官方”示例代码的副本:ConvertUTF.c和 ConvertUTF.h .如果您需要自己动手，我强烈建议您先检查此代码，因为它已经过良好测试。

关于Objective-C如何获取unicode字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4726593/

Objective-C如何获取unicode字符

上一篇：objective-c - 如何在 iPhone 中执行延迟测试

下一篇：objective-c - 使用参数创建 View Controller