在 Java 中,String#toLowerCase
方法使用默认系统 Locale
来确定如何处理小写。如果我将一些 ASCII 文本小写,并希望确保按预期进行处理,我应该使用哪个语言环境?
编辑:我主要关心编程标识符,例如模式中的表名和列名。因此,我希望应用英文小写字母。
Locale.ROOT
声明它是区域设置敏感操作的语言/国家/地区中性区域设置
Locale.ENGLISH
大概也是一个安全的选择。
最佳答案
是的,Locale.ENGLISH
是编程语言标识符和 URL 部分等大小写操作的安全选择,因为它不涉及任何特殊的大小写规则和所有 7 位 ASCII 字符英文大小写转换为 7 位 ASCII 字符。
所有其他语言环境并非如此。在土耳其语中,“I”和“i”字符不会大小写转换。
The Turkish alphabet, which is a variant of the Latin alphabet, includes two distinct versions of the letter I, one dotted and the other dotless.
In Unicode, U+0131 is a lower case letter dotless i (ı). U+0130 (İ) is capital i with dot. ISO-8859-9 has them at positions 0xFD and 0xDD respectively. In normal typography, when lower case i is combined with other diacritics, the dot is generally removed before the diacritic is added; however, Unicode still lists the equivalent combining sequences as including the dotted i, since logically it is the normal dotted i character that is being modified.
Most Unicode software uppercases ı to I and lowercases İ to i, but, unless specifically set up for Turkish, it lowercases I to i and uppercases i to I. Thus uppercasing then lowercasing, or vice versa, changes the letters.
特殊异常(exception)列表维护在 http://unicode.org/Public/UNIDATA/SpecialCasing.txt
# ================================================================================ # Turkish and Azeri # I and i-dotless; I-dot and i are case pairs in Turkish and Azeri # The following rules handle those cases. 0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE 0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE # When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i. # This matches the behavior of the canonically equivalent I-dot_above 0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE 0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE
...
关于java - 调用 String#toLowerCase 时应该指定哪个区域设置?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10336730/