我尝试使用具有不同字符的 normalize('NFKC')
方法,但它不起作用。幸运的是,不能对 NFC
这么说。如果可能,normalize('NFC')
始终将多个代码点替换为单个代码点。例如:
let t1 = `\u00F4`; //ô
let t2 = `\u006F\u0302`; //ô
console.log(t2.normalize('NFC') == t1); //true
下面是 NFKC
的示例,但它永远不起作用:
let s1 = '\uFB00'; //"ff"
let s2 = '\u0066\u0066'; //"ff"
console.log(s2.normalize('NFKC') == s1); //false
我之前认为NFKC
用表示兼容字符的单个代码点替换了多个代码点。简单来说,我认为NFKC
会将\u0066\u0066
替换为\uFB00
。
如果 NFKC
不能那样工作,那么...它是如何工作的?
最佳答案
问题是NFKC
(以及NFKD
)支持兼容且规范等效的规范化。
The type of full decomposition chosen depends on which Unicode Normalization Form is involved. For NFC or NFD, one does a full canonical decomposition, which makes use of only canonical Decomposition_Mapping values. For NFKC or NFKD, one does a full compatibility decomposition, which makes use of canonical and compatibility Decomposition_Mapping values.
这是完全可以理解的,因为 MDN说:
All canonically equivalent sequences are also compatible, but not vice versa.
但还值得注意的是,NFKC
以不同的方式进行兼容且规范等效的规范化。 NFKC
的规范等效标准化的生成方式与 NFC
相同。例如:
//"ô" (U+00F4) -> "a" (U+006F) + " ̂" (U+0302) -> "â" (U+00F4)
let c1 = `\u006F\u0302`; //ô
console.log(c1.normalize('NFKC').length); //1
但是此参数的兼容规范化的工作方式有所不同。 spec是说:
Normalization Form KC does not attempt to map character sequences to compatibility composites. For example, a compatibility composition of “office” does not produce “o\uFB03ce”, even though “\uFB03” is a character that is the compatibility equivalent of the sequence of three characters “ffi”. In other words, the composition phase of NFC and NFKC are the same—only their decomposition phase differs, with NFKC applying compatibility decompositions.
例如:
//"ff"(U+FB00) -> "f"(U+0066) + "i"(U+0066) -> "f"(U+0066) + "i"(U+0066)
let c2 = '\u0066\u0066'; //ff
console.log(c2.normalize('NFKC').length); //2
关于javascript - 在什么情况下规范化 ('NFKC' ) 方法有效?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69058397/