java - 调用 String#toLowerCase 时应该指定哪个区域设置?

标签 java localization internationalization

在 Java 中,String#toLowerCase 方法使用默认系统 Locale 来确定如何处理小写。如果我将一些 ASCII 文本小写,并希望确保按预期进行处理,我应该使用哪个语言环境?

编辑:我主要关心编程标识符,例如模式中的表名和列名。因此,我希望应用英文小写字母。

Locale.ROOT 声明它是区域设置敏感操作的语言/国家/地区中性区域设置

Locale.ENGLISH 大概也是一个安全的选择。

最佳答案

是的,Locale.ENGLISH 是编程语言标识符和 URL 部分等大小写操作的安全选择,因为它不涉及任何特殊的大小写规则和所有 7 位 ASCII 字符英文大小写转换为 7 位 ASCII 字符。

所有其他语言环境并非如此。在土耳其语中,“I”和“i”字符不会大小写转换。

"Dotted and dotless I"解释:

The Turkish alphabet, which is a variant of the Latin alphabet, includes two distinct versions of the letter I, one dotted and the other dotless.

In Unicode, U+0131 is a lower case letter dotless i (ı). U+0130 (İ) is capital i with dot. ISO-8859-9 has them at positions 0xFD and 0xDD respectively. In normal typography, when lower case i is combined with other diacritics, the dot is generally removed before the diacritic is added; however, Unicode still lists the equivalent combining sequences as including the dotted i, since logically it is the normal dotted i character that is being modified.

Most Unicode software uppercases ı to I and lowercases İ to i, but, unless specifically set up for Turkish, it lowercases I to i and uppercases i to I. Thus uppercasing then lowercasing, or vice versa, changes the letters.

特殊异常(exception)列表维护在 http://unicode.org/Public/UNIDATA/SpecialCasing.txt

# ================================================================================

# Turkish and Azeri

# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.

0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE

# When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
# This matches the behavior of the canonically equivalent I-dot_above

0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE

...

关于java - 调用 String#toLowerCase 时应该指定哪个区域设置?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10336730/

相关文章:

java - 如何从 Java 应用程序内部获取 VM 参数?

java - 为什么我不能在 eclipse.ini 中将 -Xmx 设置为 1024m?

localization - .resw 资源中的超链接文本值

ruby-on-rails - rails : Remove missing translation errors

java - 在 Eclipse 中配置字符串外部化以使用 ${key} 作为字段名称

java - 比较多个子串

java - 如何在 java 中表示位 vector 以便我可以在 O(log n) 中搜索

java - Struts2 中的属性文件位置

javascript - 如何以 reCAPTCHA 形式重写链接文本? (本地化问题)

spring - 国际化 (i18n) 工作,但重音字符不重音