javascript - 在什么情况下规范化 ('NFKC' ) 方法有效？

我尝试使用具有不同字符的 normalize('NFKC') 方法，但它不起作用。幸运的是，不能对 NFC 这么说。如果可能，normalize('NFC') 始终将多个代码点替换为单个代码点。例如:

let t1 = `\u00F4`; //ô
let t2 = `\u006F\u0302`; //ô
console.log(t2.normalize('NFC') == t1); //true

下面是 NFKC 的示例，但它永远不起作用:

let s1 = '\uFB00'; //"ﬀ"
let s2 = '\u0066\u0066'; //"ff"
console.log(s2.normalize('NFKC') == s1); //false

我之前认为NFKC用表示兼容字符的单个代码点替换了多个代码点。简单来说，我认为NFKC会将\u0066\u0066替换为\uFB00。

如果 NFKC 不能那样工作，那么...它是如何工作的？

最佳答案

问题是NFKC(以及NFKD)支持兼容且规范等效的规范化。

Unicode

The type of full decomposition chosen depends on which Unicode Normalization Form is involved. For NFC or NFD, one does a full canonical decomposition, which makes use of only canonical Decomposition_Mapping values. For NFKC or NFKD, one does a full compatibility decomposition, which makes use of canonical and compatibility Decomposition_Mapping values.

这是完全可以理解的，因为 MDN说:

All canonically equivalent sequences are also compatible, but not vice versa.

但还值得注意的是，NFKC 以不同的方式进行兼容且规范等效的规范化。 NFKC 的规范等效标准化的生成方式与 NFC 相同。例如:

//"ô" (U+00F4) -> "a" (U+006F) + " ̂" (U+0302) -> "â" (U+00F4)
let c1 = `\u006F\u0302`; //ô
console.log(c1.normalize('NFKC').length); //1

但是此参数的兼容规范化的工作方式有所不同。 spec是说:

Normalization Form KC does not attempt to map character sequences to compatibility composites. For example, a compatibility composition of “office” does not produce “o\uFB03ce”, even though “\uFB03” is a character that is the compatibility equivalent of the sequence of three characters “ffi”. In other words, the composition phase of NFC and NFKC are the same—only their decomposition phase differs, with NFKC applying compatibility decompositions.

例如:

//"ﬀ"(U+FB00) -> "f"(U+0066) + "i"(U+0066) -> "f"(U+0066) + "i"(U+0066)
let c2 = '\u0066\u0066'; //ff
console.log(c2.normalize('NFKC').length); //2

关于javascript - 在什么情况下规范化 ('NFKC' ) 方法有效？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69058397/

javascript - 在什么情况下规范化 ('NFKC' ) 方法有效？

上一篇：microservices - 以正确的方式实现 CQRS/ES

下一篇：architecture - 如何处理事件驱动架构中缺乏无限保留的问题？