我有以下数据集。我想查找研究中使用的每个应用在哪一年的季度中产生了最高的安装次数?
Installs CR Month Year Category
0 10000 Everyone January 2018 ART_AND_DESIGN
1 500000 Everyone January 2018 ART_AND_DESIGN
2 5000000 Everyone August 2018 ART_AND_DESIGN
3 50000000 Teen June 2018 ART_AND_DESIGN
4 100000 Everyone June 2018 ART_AND_DESIGN
... ... ... ... ...
10836 5000 Everyone July 2017 FAMILY
10837 100 Everyone July 2018 FAMILY
10838 1000 Everyone January 2017 MEDICAL
10839 1000 Mature 17+ January 2015 BOOKS_AND_REFERENCE
10840 10000000 Everyone July 2018 LIFESTYLE
最佳答案
如果需要每季度和类别
的最大值,请使用:
q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
.dt.to_period('Q').rename('Quarter'))
df = df.groupby([q,'Category'])['Installs'].max().reset_index()
print (df)
Quarter Category Installs
0 2015Q1 BOOKS_AND_REFERENCE 1000
1 2017Q1 MEDICAL 1000
2 2017Q3 FAMILY 5000
3 2018Q1 ART_AND_DESIGN 500000
4 2018Q2 ART_AND_DESIGN 50000000
5 2018Q3 ART_AND_DESIGN 5000000
6 2018Q3 FAMILY 100
或者,如果需要按季度和类别汇总安装
并获取最大安装
的查询者,则使用:
q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
.dt.to_period('Q').rename('Quarter'))
df1 = df.groupby([q,'Category'])['Installs'].sum().reset_index()
print (df1)
Quarter Category Installs
0 2015Q1 BOOKS_AND_REFERENCE 1000
1 2017Q1 MEDICAL 1000
2 2017Q3 FAMILY 5000
3 2018Q1 ART_AND_DESIGN 510000
4 2018Q2 ART_AND_DESIGN 50100000
5 2018Q3 ART_AND_DESIGN 5000000
6 2018Q3 FAMILY 100
7 2018Q3 LIFESTYLE 10000000
df2 = df1.loc[df1.groupby('Category')['Installs'].idxmax()]
print (df2)
Quarter Category Installs
4 2018Q2 ART_AND_DESIGN 50100000
0 2015Q1 BOOKS_AND_REFERENCE 1000
2 2017Q3 FAMILY 5000
7 2018Q3 LIFESTYLE 10000000
1 2017Q1 MEDICAL 1000
关于python - 如何使用 Groupby 函数查找数据框中的最高值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59588904/