python - 按 Pandas 数据帧的多索引数据中的索引和值排序

标签 python python-3.x pandas sorting multi-index

假设我有一个数据框如下:

    year    month   message
0   2018    2   txt1
1   2017    4   txt2
2   2019    5   txt3
3   2017    5   txt5
4   2017    5   txt4
5   2020    4   txt3
6   2020    6   txt3
7   2020    6   txt3
8   2020    6   txt4

我想找出每年消息数量最多的前三名。因此，我将数据分组如下:

df.groupby(['year','month']).count()

结果:

            message
year    month   
2017    4   1
        5   2
2018    2   1
2019    5   1
2020    4   1
        6   3

两个索引的数据均按升序排列。但是如何找到如下所示的结果，其中数据按年份(升序)和前 n 个值的计数(降序)排序。 “月份”索引将免费。

            message
year    month   
2017    5   2
        4   1
2018    2   1
2019    5   1
2020    6   3
        4   1

最佳答案

value_counts 默认为您提供排序:

df.groupby('year')['month'].value_counts()

输出:

year  month
2017  5        2
      4        1
2018  2        1
2019  5        1
2020  6        3
      4        1
Name: month, dtype: int64

如果您每年只需要 2 个最高值，请执行另一个分组:

(df.groupby('year')['month'].value_counts()
   .groupby('year').head(2)
)

输出:

year  month
2017  5        2
      4        1
2018  2        1
2019  5        1
2020  6        3
      4        1
Name: month, dtype: int64

关于python - 按 Pandas 数据帧的多索引数据中的索引和值排序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60600042/

上一篇：sql - Oracle/PL SQL/SQL where 子句的 null 比较

下一篇：angularjs - ENOENT : no such file or directory for node_modules\jquery\dist\jquery. min.js'

相关文章：

python - HTML <li> 有什么限制吗？

python - Django 登录/注销

python - Plotly Python 中的趋势线

python - 如何使用多列的值计数按组汇总 pandas DataFrame？

python - sklearn-线性回归 : could not convert string to float: '--'

python - 为什么这两种情况不会返回相同的输出？

python - 将无聊的事情自动化第 6 章几乎完成的表打印机

python - 使用 python api O365 接收电子邮件

python - 如何在 Python3 中以更快的方式查找列表的子数组？

python - Pandas 条形图呈灰色