在 Pandas 的 sort_values
方法中,kind
参数仅在对单个列或标签进行排序时应用。为什么会这样?在这些未应用 kind
参数的情况下使用什么排序算法?是稳定排序吗?
(有关文档,请参阅 https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html。)
最佳答案
这是一个docstring from the source file ,声明 get_group_index_sorter(group_index, ngroups)
:
algos.groupsort_indexer implements `counting sort` and it is at least O(ngroups), where ngroups = prod(shape) shape = map(len, keys) that is, linear in the number of combinations (cartesian product) of unique values of groupby keys. This can be huge when doing multi-key groupby. np.argsort(kind='mergesort') is O(count x log(count)) where count is the length of the data-frame;
Both algorithms are `stable` sort and that is necessary for correctness of
groupby operations. e.g. consider: df.groupby(key)[col].transform('first')
PS这里是一个“调用链”:
pandas.core.frame.DataFrame.sort_values() -> \
pandas.core.sorting.lexsort_indexer() -> \
pandas.core.sorting.indexer_from_factorized() -> \
pandas.core.sorting.get_group_index_sorter()
关于python - 未应用 kind 参数时 Panda 的 sort_values 使用的排序算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44205655/