python - Pandas groupby : get max value in a subgroup

我有一个按列、行、年份、potveg 和总计分组的大型数据集。我正在尝试获取组的特定年份中“总计”列的最大值。即，对于以下数据集:

col      row    year    potveg  total

-125.0  42.5    2015    9       697.3
                2015    13      535.2
                2015    15      82.3
                2016    9       907.8
                2016    13      137.6
                2016    15      268.4
                2017    9       961.9
                2017    13      74.2
                2017    15      248.0
                2018    9       937.9
                2018    13      575.6
                2018    15      215.5
-135.0  70.5    2015    8       697.3
                2015    10      535.2
                2015    19      82.3
                2016    8       907.8
                2016    10      137.6
                2016    19      268.4
                2017    8       961.9
                2017    10      74.2
                2017    19      248.0
                2018    8       937.9
                2018    10      575.6
                2018    19      215.5

我希望输出如下所示:

col      row    year    potveg  total

-125.0  42.5    2015    9       697.3
                2016    9       907.8
                2017    9       961.9
                2018    9       937.9
-135.0  70.5    2015    8       697.3
                2016    8       907.8
                2017    8       961.9
                2018    8       937.9

我尝试过这个:

df.groupby(['col', 'row', 'year', 'potveg']).agg({'total': 'max'})

还有这个:

df.groupby(['col', 'row', 'year', 'potveg'])['total'].max()

但它们似乎不起作用，因为输出的行太多。我认为问题在于“potveg”列，它是一个子组。我不知道如何选择包含最大值“total”的行。

最佳答案

一种可能的解决方案，在groupby.apply中使用.idxmax():

print(
    df.groupby(["col", "row", "year"], as_index=False, sort=False).apply(
        lambda x: x.loc[x["total"].idxmax()]
    )
)

打印:

     col   row    year  potveg  total
0 -125.0  42.5  2015.0     9.0  697.3
1 -125.0  42.5  2016.0     9.0  907.8
2 -125.0  42.5  2017.0     9.0  961.9
3 -125.0  42.5  2018.0     9.0  937.9
4 -135.0  70.5  2015.0     8.0  697.3
5 -135.0  70.5  2016.0     8.0  907.8
6 -135.0  70.5  2017.0     8.0  961.9
7 -135.0  70.5  2018.0     8.0  937.9

使用的数据框:

       col   row  year potveg  total
0   -125.0  42.5  2015      9  697.3
1   -125.0  42.5  2015     13  535.2
2   -125.0  42.5  2015     15   82.3
3   -125.0  42.5  2016      9  907.8
4   -125.0  42.5  2016     13  137.6
5   -125.0  42.5  2016     15  268.4
6   -125.0  42.5  2017      9  961.9
7   -125.0  42.5  2017     13   74.2
8   -125.0  42.5  2017     15  248.0
9   -125.0  42.5  2018      9  937.9
10  -125.0  42.5  2018     13  575.6
11  -125.0  42.5  2018     15  215.5
12  -135.0  70.5  2015      8  697.3
13  -135.0  70.5  2015     10  535.2
14  -135.0  70.5  2015     19   82.3
15  -135.0  70.5  2016      8  907.8
16  -135.0  70.5  2016     10  137.6
17  -135.0  70.5  2016     19  268.4
18  -135.0  70.5  2017      8  961.9
19  -135.0  70.5  2017     10   74.2
20  -135.0  70.5  2017     19  248.0
21  -135.0  70.5  2018      8  937.9
22  -135.0  70.5  2018     10  575.6
23  -135.0  70.5  2018     19  215.5

关于python - Pandas groupby : get max value in a subgroup，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73241761/

python - Pandas groupby : get max value in a subgroup

上一篇：c# - 使用 @@ROWCOUNT 的旧 SQL 函数会导致问题

下一篇：python - 如何删除 Python 中具有空值的重复项？