bash - 当未提供 --stable 选项时,sort -n 是否可以预测地处理关系?如果有,怎么办?

标签 bash sorting language-lawyer lexicographic stable-sort

这里看起来像两行中 3 之后的空格打破了数字排序并让字母排序开始,所以 11<2:

$ echo -e '3 2\n3 11' | sort -n
3 11
3 2

man sort 中,我阅读了

       -s, --stable
              stabilize sort by disabling last-resort comparison

这意味着 没有 -s 最后的比较完成(关系之间,因为-s 不影响非关系)。

所以问题是:这个最后的比较是如何完成的?如果需要回答这个问题,欢迎引用源代码。

This answer Unix从实验中推断出关系的排序是字典顺序的。

标准/POSIX 对此有任何说明吗?

最佳答案

Here it looks like the space after the 3 in both rows breaks the numerical sorting and lets the alphabetic sorting kick in

sort -n 不是 sort -n -k1,1 -k2,2sort -n整行(不是字段!)解释为数字,如atoi("3 11") 给出 3。然后对这些数字进行排序。因为 sort_them(atoi("3 11"), atoi("3 2")) 是未排序的,因为它们都是数字 3,最后的比较排序开始了。

how is this last-resort comparison accomplished?

这个想法是,整行都是通过 strcmp 或类似的(即 strcoll)进行比较。因为 12 之前,strcmp("3 11", "3 2")3 11 排序为首先。不考虑任何选项,不考虑 -n

A reference to the source code would be welcome, if necessary to answer the question.

它实际上是 GNU 排序中的 xmemcoll0 以在 coreutils sort.c#L2653 in compare (struct line const *a, struct line const *b) 中考虑整理当未设置 LC_COLLATE 时,有 memcmp 作为回退。

我在 openbsd 中看到它在 openbsd/sort/coll.c#L528 str_list_coll(struct bwstring *str1, struct sort_list_item **ss2) 附近排序还有in list_coll_offset() ,其中如果所有键都比较相等,则调用 top_level_str_coll,它只是对整行进行排序。

Does the standard/POSIX say anything about this?

如果“this”指的是稳定排序和最后手段比较,那么当然可以。让我们从 POSIX sort 复制整个段落强调我的:

Comparisons shall be based on one or more sort keys extracted from each line of input (or, if no sort keys are specified, the entire line up to, but not including, the terminating ), and shall be performed using the collating sequence of the current locale. If this collating sequence does not have a total ordering of all characters (see XBD LC_COLLATE), any lines of input that collate equally should be further compared byte-by-byte using the collating sequence for the POSIX locale.

Implementations are encouraged to perform the recommended further byte-by-byte comparison of lines that collate equally, even though this may affect efficiency. The impact on efficiency can be mitigated by only performing the additional comparison if the current locale's collating sequence does not have a total ordering of all characters (if the implementation provides a way to query this) or by only performing the additional comparison if the locale name associated with the LC_COLLATE category has an '@' modifier in the name (since locales without an '@' modifier should have a total ordering of all characters - see XBD LC_COLLATE). Note that if the implementation provides a stable sort option as an extension (usually -s), the additional comparison should not be performed when this option has been specified.

关于bash - 当未提供 --stable 选项时,sort -n 是否可以预测地处理关系?如果有,怎么办?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65302655/

相关文章:

arrays - Bash Looping 2 Arrays with whitespaces in variables

javascript - 如何在 JQGrid 中按行排序?

algorithm - 计算排序序列的最小交换次数

java - 尝试获取正确的月份以显示最高和最低值

c++ - 为什么一个结构有另一个结构作为成员包装在一个 union 中,如果没有显式默认构造函数则不能编译?

c++ - std::atomic 中的任何内容都是免等待的?

ruby - 是否可以在 gem 中包含 Bash 脚本?

bash - 无法完全关闭脚本导出中的远程SSH隧道

c++ - 模板中的字符串文字 - 编译器的不同行为

bash - Shell 脚本从命令行运行,而不是 cron