python - 从多索引表中获取最大行

我有一个看起来与此类似的表格:

<表类="s-表"> <头> user_id <日>日期计数 <正文> 1 2020 5 2021 7 2 2017 1 3 2020 2 2019 1 2021 3

我试图只为每个 user_id 保留具有最大计数的行，因此它应该看起来像这样:

<表类="s-表"> <头> user_id <日>日期计数 <正文> 1 2021 7 2 2017 1 3 2021 3

我试过使用 df.groupby(level=0).apply(max) 但它从最终表中删除了日期列，我不确定如何修改它以保留所有三个原始列

最佳答案

您可以尝试在 .groupby() 之后仅指定列 count ，然后使用 .apply() 生成 bool 系列是否组中的当前条目等于组中的最大 count。然后，使用 .loc 定位 bool 系列并显示整个数据框。

df.loc[df.groupby(level=0)['count'].apply(lambda x: x == x.max())]

结果:

         date  count
user_id             
1        2021      7
2        2017      1
3        2021      3

请注意，如果一个 user_id 中有多个条目具有相同的最大计数，则将保留所有这些条目。

如果对于数量最多的多个条目，您只想为每个 user_id 保留一个条目，则可以改用以下逻辑:

df1 = df.reset_index()
df1.loc[df1.groupby('user_id')['count'].idxmax()].set_index('user_id')

结果:

         date  count
user_id             
1        2021      7
2        2017      1
3        2021      3

请注意，我们不能简单地使用df.loc[df.groupby(level=0)["count"].idxmax()] 因为user_id 是行索引。此代码仅向您提供所有未过滤的行，就像未处理的原始数据帧一样。这是因为 idxmax() 在此代码中返回的索引是 user_id 本身(而不是简单的 RangeIndex 0、1、2 等)。然后，当 .loc 找到这些 user_id 索引时，它将简单地返回相同 user_id 下的所有条目。

演示

让我们向示例数据添加更多条目并查看 2 种解决方案之间的差异:

我们的基础 df(user_id 是行索引):

         date  count
user_id             
1        2018      7                 <=== max1
1        2020      5
1        2021      7                 <=== max2
2        2017      1
3        2020      3                 <=== max1
3        2019      1
3        2021      3                 <=== max2

第一个解决方案结果:

df.loc[df.groupby(level=0)['count'].apply(lambda x: x == x.max())]


         date  count
user_id             
1        2018      7
1        2021      7
2        2017      1
3        2020      3
3        2021      3

第二个解决方案结果:

df1 = df.reset_index()
df1.loc[df1.groupby('user_id')['count'].idxmax()].set_index('user_id')


         date  count
user_id             
1        2018      7
2        2017      1
3        2020      3

关于python - 从多索引表中获取最大行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68987574/

python - 从多索引表中获取最大行

演示

上一篇：webpack - 如何在 webpack 5 中更改 web worker 文件的输出格式

下一篇：python - 错误 : 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte in google colab