c - 按我的语言环境的字母顺序排列

标签 c string

我想知道是否有一个函数可以按字母顺序比较两个 wchar_t 类型的变量(在设置我的语言环境之后 - 在我的例子中,代码是

    setlocale(LC_ALL, "pl_PL.UTF-8");

有人建议使用 wcscoll ,但它并没有给我太多帮助,因为它比较字符串(即 wchar_t *)。如果我有两个变量,类型为 wchar_ta,b,则调用 wcscoll(&a, &b) 无法正常工作

非常感谢您的帮助

最佳答案

简短的回答是“否”。区域设置相关的排序规则是针对两个字符串的函数,而不是针对两个字符的函数。您可以整理两个单字符长的字符串,但这与两个单独的字符不太一样。

正如建议的那样,您使用 wcscoll 来比较两个 wstring。为了将 wchar 转换为 wstring,您需要创建以 null 结尾的 wchar_t 数组。所以你可以使用这样的函数:

int wc_collate(wchar_t a, wchar_t b) {
  return wcscoll((wchar_t[2]){a, 0}, (wchar_t[2]){b, 0});
}

我在 C 标准中找不到任何内容来证明我在第一段中的主张,但是 Posix rationale即将推出:

The rules governing collation depend to some extent on the use. At least five different levels of increasingly complex collation rules can be distinguished:

  1. Byte/machine code order: This is the historical collation order in the UNIX system and many proprietary operating systems. Collation is here performed character by character, without any regard to context. The primary virtue is that it usually is quite fast and also completely deterministic; it works well when the native machine collation sequence matches the user expectations.

  2. Character order: On this level, collation is also performed character by character, without regard to context. The order between characters is, however, not determined by the code values, but on the expectations by the user of the "correct" order between characters. In addition, such a (simple) collation order can specify that certain characters collate equally (for example, uppercase and lowercase letters).

  3. String ordering: On this level, entire strings are compared based on relatively straightforward rules. Several "passes'' may be required to determine the order between two strings. Characters may be ignored in some passes, but not in others; the strings may be compared in different directions; and simple string substitutions may be performed before strings are compared. This level is best described as "dictionary" ordering; it is based on the spelling, not the pronunciation, or meaning, of the words.

  4. Text search ordering: This is a further refinement of the previous level, best described as "telephone book ordering''; some common homonyms (words spelled differently but with the same pronunciation) are collated together; numbers are collated as if they were spelled out, and so on.

  5. Semantic-level ordering: Words and strings are collated based on their meaning; entire words (such as "the") are eliminated; the ordering is not deterministic. This usually requires special software and is highly dependent on the intended use.

While the historical collation order formally is at level 1, for the English language it corresponds roughly to elements at level 2. The user expects to see the output from the ls utility sorted very much as it would be in a dictionary. While telephone book ordering would be an optimal goal for standard collation, this was ruled out as the order would be language-dependent. Furthermore, a requirement was that the order must be determined solely from the text string and the collation rules; no external information (for example, "pronunciation dictionaries") could be required.

As a result, the goal for the collation support is at level 3.

关于c - 按我的语言环境的字母顺序排列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30417748/

相关文章:

c - 如何使用指针访问使用 malloc 创建的二维数组的元素?

java - Java 中的惯用 "First Last"=> "Last, First"

java - 如何在 java 中将字符串数字转换为逗号分隔的整数?

linux - 使用linux查找数据文件中两个字符串之间的最小值和最大值

编译时的 C 函数装饰器(包装器)

c - 给定整数时间,格式为hhmmss,如何将其格式化为hh:mm:ss

c++ - mbed 中的 Hello World MQTT 程序

python - 使用 Python Parse 获取由数字、字母、空格和符号组成的字符串

c# - 将小 C# "Random String"函数转换为 Haskell 时遇到问题

c - 从套接字读取缓冲区