python - 如何从 GroupBy.apply() 中删除多索引？

df = pandas.DataFrame([[2001, "Jack", 77], [2005, "Jack", 44], [2001, "Jill", 93]],columns=['Year','Name','Value'])

    Year    Name    Value
0   2001    Jack    77
1   2005    Jack    44
2   2001    Jill    93
For each unique Name, I would like to keep the row with the largest Year value. In the above example I would like to get the table
    Year    Name    Value
0   2005    Jack    44
1   2001    Jill    93

我尝试用 groupby + (apply) 解决这个问题:

df.groupby('Name', as_index=False)\
     .apply(lambda x: x.sort_values('Value').head(1))
     Year  Name  Value
0 0  2001  Jack     44
1 2  2001  Jill     93

这不是最好的方法，但我对正在发生的事情及其原因更感兴趣。结果有一个如下所示的 MultiIndex:

MultiIndex(levels=[[0, 1], [0, 2]],
           labels=[[0, 1], [0, 1]])

我不是在寻找解决方法。实际上，我更想知道为什么会发生这种情况，以及如何在不改变我的方法的情况下防止这种情况发生。

最佳答案

IIUC，使用group_keys=False:

df.groupby('Name', group_keys=False).apply(lambda x: x.sort_values('Value').head(1))

输出:

   Year  Name  Value
1  2005  Jack     44
2  2001  Jill     93

关于python - 如何从 GroupBy.apply() 中删除多索引？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46678142/

上一篇：python - 使用气流将文件流式传输到kafka

下一篇：python - 如何为给定外群的一组物种生成所有可能的 Newick 树排列？

python - 添加一个新列，将短语中的所有大写单词附加到每行的列表中

pandas - 绘制包含 NaN 的 Pandas 数据框

python - OLS 与 Pandas : datetime index as predictor

python - 为什么我的 hdf5 文件看起来太大了？

python - 将行添加到数据框以统一组的长度

Python 2.7 与 Adwords API : ImportError: cannot import name AdWordsClient

python - 使用 Python OpenCV 从图像中删除边框

python - 如何使用多索引过滤器为列分配值？

python - 将常数时间添加到日期时间列