java - 调用 String#toLowerCase 时应该指定哪个区域设置？

在 Java 中，String#toLowerCase 方法使用默认系统 Locale 来确定如何处理小写。如果我将一些 ASCII 文本小写，并希望确保按预期进行处理，我应该使用哪个语言环境？

编辑:我主要关心编程标识符，例如模式中的表名和列名。因此，我希望应用英文小写字母。

Locale.ROOT 声明它是区域设置敏感操作的语言/国家/地区中性区域设置

Locale.ENGLISH 大概也是一个安全的选择。

最佳答案

是的，Locale.ENGLISH 是编程语言标识符和 URL 部分等大小写操作的安全选择，因为它不涉及任何特殊的大小写规则和所有 7 位 ASCII 字符英文大小写转换为 7 位 ASCII 字符。

所有其他语言环境并非如此。在土耳其语中，“I”和“i”字符不会大小写转换。

The Turkish alphabet, which is a variant of the Latin alphabet, includes two distinct versions of the letter I, one dotted and the other dotless.

In Unicode, U+0131 is a lower case letter dotless i (ı). U+0130 (İ) is capital i with dot. ISO-8859-9 has them at positions 0xFD and 0xDD respectively. In normal typography, when lower case i is combined with other diacritics, the dot is generally removed before the diacritic is added; however, Unicode still lists the equivalent combining sequences as including the dotted i, since logically it is the normal dotted i character that is being modified.

Most Unicode software uppercases ı to I and lowercases İ to i, but, unless specifically set up for Turkish, it lowercases I to i and uppercases i to I. Thus uppercasing then lowercasing, or vice versa, changes the letters.

特殊异常(exception)列表维护在 http://unicode.org/Public/UNIDATA/SpecialCasing.txt

# ================================================================================

# Turkish and Azeri

# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.

0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE

# When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
# This matches the behavior of the canonically equivalent I-dot_above

0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE

...

关于java - 调用 String#toLowerCase 时应该指定哪个区域设置？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10336730/

java - 调用 String#toLowerCase 时应该指定哪个区域设置？

上一篇：java - 如何在Maven中更新子模块的版本？

下一篇：java - Rhino 并发访问 javax.script.ScriptEngine