python - 未应用 kind 参数时 Panda 的 sort_values 使用的排序算法

在 Pandas 的 sort_values 方法中，kind 参数仅在对单个列或标签进行排序时应用。为什么会这样？在这些未应用 kind 参数的情况下使用什么排序算法？是稳定排序吗？

(有关文档，请参阅 https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html。)

最佳答案

这是一个docstring from the source file ，声明 get_group_index_sorter(group_index, ngroups):

algos.groupsort_indexer implements `counting sort` and it is at least
O(ngroups), where
    ngroups = prod(shape)
    shape = map(len, keys)
that is, linear in the number of combinations (cartesian product) of unique
values of groupby keys. This can be huge when doing multi-key groupby.
np.argsort(kind='mergesort') is O(count x log(count)) where count is the
length of the data-frame;

Both algorithms are `stable` sort and that is necessary for correctness of

groupby operations. e.g. consider:
    df.groupby(key)[col].transform('first')

PS这里是一个“调用链”:

pandas.core.frame.DataFrame.sort_values() -> \
  pandas.core.sorting.lexsort_indexer() ->  \
    pandas.core.sorting.indexer_from_factorized() -> \
      pandas.core.sorting.get_group_index_sorter()

关于python - 未应用 kind 参数时 Panda 的 sort_values 使用的排序算法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44205655/

上一篇：python - 当新请求来自同一 session 中的同一用户时如何取消先前的请求

下一篇：python - Django:如何检查 Q 对象是否为空？

python - 使用正则表达式查看字符串中的数字是否在其他数字之前？

arrays - 根据特定元素对 SKSpriteNode 数组进行排序

快速排序可以与多个数组一起使用吗？

arrays - 如何通过另一个 json 值对 json 中的键进行排序

python - 按月匹配两个数据框索引

python - Pygame碰撞问题

python - 如何将数据框的行调整为列

python - pyQt5滚动条自动跟随

python - Pandas DataFrame 不会重新索引和转置，返回 NaN