unicode - 为什么 .ords 与 .chars 不一致？

我对.chars的理解是它返回"the number of characters in the string in graphemes" 。我对 .ords 的理解是它返回 "a list of codepoint numbers, one for the base character of each grapheme in the string" 。也就是说，.chars 返回字素的数量，.ords 返回每个字素的一个代码点(基数)。但是，我在 MoarVM 2016.07 上的 Rakudo 2016.07.1 中看到的行为似乎与此不匹配:

> "\x[2764]\x[fe0e]".chars
1
> "\x[2764]\x[fe0e]".ords.fmt("U+%04x")
U+2764 U+fe0e
> "e\x[301]".ords.fmt("U+%04x")
U+00e9
> "0\x[301]".ords.fmt("U+%04x")
U+0030

.chars 方法返回 HEAVY BLACK HEART 和 VARIATION SELECTOR-15 的期望值 1(文本表示 ❤︎ 而不是表情符号 ❤️，U+2764 U+fe0f)，但是 .ords 返回两个代码点而不仅仅是基数(我期望只是 U+2764)。更令人困惑的是，如果您在拉丁小写字母 E 上调用 .ords 并结合锐音重音，您将得到 U+00e9(拉丁小写字母 E 与锐音)。我期待 U+0065，因为拉丁文小写字母 E 是基本代码点。当没有 NFC 版本的字符串时(例如 U+0030 表示 0́)，我确实会得到预期的结果。

我对 .chars 和 .ords 的理解是否有缺陷，或者这是一个错误？

最佳答案

有关 .ords 方法的文档错误。一位核心开发人员刚刚使用此提交更新了文档:

https://github.com/perl6/doc/commit/12ec5fc35e

很快就会出现在网站上。

关于unicode - 为什么 .ords 与 .chars 不一致？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39603451/

unicode - 为什么 .ords 与 .chars 不一致？

上一篇：sql-server - 更改关系的默认名称

下一篇：couchdb - 使用 CouchDB 作为 Ledger State 数据库时，Hyperledger Fabric 中的数据如何存储？