python - 如何使用 Groupby 函数查找数据框中的最高值

标签 python pandas dataframe

我有以下数据集。我想查找研究中使用的每个应用在哪一年的季度中产生了最高的安装次数?

        Installs          CR     Month  Year             Category
0         10000    Everyone  January   2018       ART_AND_DESIGN
1        500000    Everyone  January   2018       ART_AND_DESIGN
2       5000000    Everyone   August   2018       ART_AND_DESIGN
3      50000000        Teen     June   2018       ART_AND_DESIGN
4        100000    Everyone     June   2018       ART_AND_DESIGN
        ...         ...       ...   ...                  ...
10836      5000    Everyone     July   2017               FAMILY
10837       100    Everyone     July   2018               FAMILY
10838      1000    Everyone  January   2017              MEDICAL
10839      1000  Mature 17+  January   2015  BOOKS_AND_REFERENCE
10840  10000000    Everyone     July   2018            LIFESTYLE

最佳答案

如果需要每季度和类别的最大值,请使用:

q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
       .dt.to_period('Q').rename('Quarter'))

df = df.groupby([q,'Category'])['Installs'].max().reset_index()
print (df)
  Quarter             Category  Installs
0  2015Q1  BOOKS_AND_REFERENCE      1000
1  2017Q1              MEDICAL      1000
2  2017Q3               FAMILY      5000
3  2018Q1       ART_AND_DESIGN    500000
4  2018Q2       ART_AND_DESIGN  50000000
5  2018Q3       ART_AND_DESIGN   5000000
6  2018Q3               FAMILY       100

或者,如果需要按季度和类别汇总安装并获取最大安装的查询者,则使用:

q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
       .dt.to_period('Q').rename('Quarter'))

df1 = df.groupby([q,'Category'])['Installs'].sum().reset_index()
print (df1)
  Quarter             Category  Installs
0  2015Q1  BOOKS_AND_REFERENCE      1000
1  2017Q1              MEDICAL      1000
2  2017Q3               FAMILY      5000
3  2018Q1       ART_AND_DESIGN    510000
4  2018Q2       ART_AND_DESIGN  50100000
5  2018Q3       ART_AND_DESIGN   5000000
6  2018Q3               FAMILY       100
7  2018Q3            LIFESTYLE  10000000

df2 = df1.loc[df1.groupby('Category')['Installs'].idxmax()]
print (df2)
  Quarter             Category  Installs
4  2018Q2       ART_AND_DESIGN  50100000
0  2015Q1  BOOKS_AND_REFERENCE      1000
2  2017Q3               FAMILY      5000
7  2018Q3            LIFESTYLE  10000000
1  2017Q1              MEDICAL      1000

关于python - 如何使用 Groupby 函数查找数据框中的最高值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59588904/

相关文章:

python - 取消透视数据框并加入 Pandas

python - 如何通过 python 更新 AWS Secrets Manager?

python - 提取/附加满足涉及多列的复杂条件的 Pandas 数据框行

python - 如何在pandas中合并多个索引

python - 从各个距离创建距离矩阵

python - 如何查找 DataFrame 中的最大值并返回结果 DataFrame

python - 按元组值日期对象对字典进行排序

python - UDF 在 PySpark 中运行两次

python - Torch.sort 和 argsort 在相同元素的情况下随机排序

python - 如何创建 python 空数据帧,其中 df.empty 结果为 True