sql-server - 为什么 SQL Server 认为 N'㐢㐢㐢㐢' and N' 㐢㐢㐢' 相等？

我们正在测试我们的应用程序的 Unicode 兼容性，并选择拉丁字符集之外的随机字符进行测试。

在拉丁语和日语校对系统上，以下等式均成立 ( U+3422 ):

N'㐢㐢㐢㐢' = N'㐢㐢㐢'

但以下不是( U+30C1 ):

N'チチチチ' = N'チチチ'

这是在使用第一个示例(使用 U+3422)的测试用例违反唯一索引时发现的。我们是否需要对用于测试的字符更有选择性？显然我们不知道上述比较的语义意义。这种行为对于母语人士来说是显而易见的吗？

最佳答案

Michael Kaplan 有一篇博客文章，其中解释了如何比较 Unicode 字符串。这一切都归结为字符串需要有权重，如果没有，它将被视为等于空字符串。

Sorting it all Out: The jury will give this string no weight

在 SQL Server 中，此权重受定义的排序规则影响。 Microsoft 已为 CJK Unified Ideographs 添加了适当的排序规则在 Windows XP/2003 和 SQL Server 2005 中。此 post建议使用 Chinese_Simplified_Pinyin_100_CI_AS 或 Chinese_Simplified_Stroke_Order_100_CI_AS:

You can always use any binary and binary2 collations although it wouldn't give you Linguistic correct result. For SQL Server 2005, you SHOULD use Chinese_PRC_90_CI_AS or Chinese_PRC_Stoke_90_CI_AS which support surrogate pair comparison (but not linguistic). For SQL Server 2008, you should use Chinese_Simplified_Pinyin_100_CI_AS and Chinese_Simplified_Stroke_Order_100_CI_AS which have better linguistic surrogate comparison. I do suggest you use these collation as your server/database/table collation instead of passing the collation name during comparison.

因此以下 SQL 语句将按预期工作:

select * from MyTable where N'' = N'㐀' COLLATE Chinese_Simplified_Stroke_Order_100_CI_AS;

可以在 MSDN 中找到所有支持的排序规则的列表:

SQL Server 2008 Books Online: Windows Collation Name

关于sql-server - 为什么 SQL Server 认为 N'㐢㐢㐢㐢' and N' 㐢㐢㐢' 相等？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2818583/

sql-server - 为什么 SQL Server 认为 N'㐢㐢㐢㐢' and N' 㐢㐢㐢' 相等？

上一篇：clojure - 合并两个复杂的数据结构

下一篇：clojure - 如何在另一个文件中使用我项目中的一个文件？

sql-server - 为什么 SQL Server 认为 N'㐢㐢㐢 㐢' and N' 㐢㐢㐢' 相等？

上一篇：clojure - 合并两个复杂的数据结构

下一篇：clojure - 如何在另一个文件中使用我项目中的一个文件？

sql-server - 为什么 SQL Server 认为 N'㐢㐢㐢㐢' and N' 㐢㐢㐢' 相等？