java - 不区分大小写的字符串比较奇怪的行为

这在 C# 和 Java 中都会发生，所以我认为这不是错误，只是想知道为什么。

var s = "𐐁";
var lower = s.ToLower();
var upper = s.ToUpper();

if (!lower.Equals(upper, StringComparison.OrdinalIgnoreCase))
{
    //How can this happen?
}

根据 this page , ""的小写为 "", 与 IgnoreCase 选项比较时应该是相等的。为什么它们不相等？

最佳答案

为 Java API 辩护:documentation of the method String.equalsIgnoreCase从未声称它可以在任意 Unicode 代码点上“按预期”工作。它说:

Two characters c1 and c2 are considered the same ignoring case if at least one of the following is true:

The two characters are the same (as compared by the == operator)

Applying the method Character.toUpperCase(char) to each character produces the same result

Applying the method Character.toLowerCase(char) to each character produces the same result

因此，文档明确指出它将 Character.toUpperCase 应用于 char，即应用于 UTF-16 代码单元 ，而不是 Unicode 代码点。

如果您在每个代码点上使用方法Character.toUpperCase(int codePoint)，那么比较会按预期进行。这是 Scala 中的一个简短示例(使用完全相同的 Java API，高阶 forall 方法有望不言自明):

val a = "𐐁"
val b = "𐐩"
(a.codePoints.toArray zip b.codePoints.toArray).forall { 
  case (x, y) => 
  Character.toLowerCase(x) == Character.toLowerCase(y) 
}

打印

true

如预期。这是为什么？我认为可以放心地将此问题归咎于向后兼容性。

关于java - 不区分大小写的字符串比较奇怪的行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52777624/

上一篇：java - 配置 OpenJFX 11 以将其 DLL 提取到不同的用户指定目录中？

下一篇：java - 从 controller1 发送消息到 controller2

java - 其他 NaN 值是什么？

c# - 时间触发器不触发后台任务 UWP

.net - DateTime.ToUniversalTime 和 TimeZoneInfo.ConvertTimeToUtc 之间有什么区别

c# - 如何将 .NET 日期转换为 NSTimeInterval？

c# - 性能:在 VS 中编译，在 Windows 和 Linux 上以 Mono 运行

java - 获取一周中的正确日期(Java 日历)

java - 如何用 Java 检查 Oracle 数据库中是否存在记录？

c# - 如何复制 Win8 metro File Picker UI

c# - 我收到错误消息，其中包含 JSON 对象中的字节数组字段