我有这个例子 df:
df3 = pd.DataFrame({'Customer':['Sara','John','Didi','Sara','Didi' ,'Didi'],
'Date': ['15-12-2021', '1-1-2022' , '1-3-2022','15-3-2022', '1-1-2022' , '1-4-2022'],
'Month': ['December-2021', 'January-2022', 'March-2022','March-2022', 'January-2022', 'April-2022'],
'Product': ['grocery','electronics','personal-care','grocery','electronics','personal-care'],
'status': ['purchased', 'refunded', 'refunded','refunded', 'purchased', 'refunded']
})
df3
给出:
Customer Date Month Product status
0 Sara 15-12-2021 December-2021 grocery purchased
1 John 1-1-2022 January-2022 electronics refunded
2 Didi 1-3-2022 March-2022 personal-care refunded
3 Sara 15-3-2022 March-2022 grocery refunded
4 Didi 1-1-2022 January-2022 electronics purchased
5 Didi 1-4-2022 April-2022 personal-care refunded
我正在尝试按客户、产品和月份进行分组并获取第一个状态,然后我希望分组按月份列排序:
df3.sort_values('Month').groupby(['Customer','Product','Month','Date']).agg({'status':'first'}).reset_index()
我得到了:
Customer Product Month Date status
0 Didi electronics January-2022 1-1-2022 purchased
1 Didi personal-care April-2022 1-4-2022 refunded
2 Didi personal-care March-2022 1-3-2022 refunded
3 John electronics January-2022 1-1-2022 refunded
4 Sara grocery December-2021 15-12-2021 purchased
5 Sara grocery March-2022 15-3-2022 refunded
我预计索引 1 和 2
的顺序会颠倒,三月在四月之前,所以我尝试做的是:
months = {'December-2021':0,'January-2022':1,'February-2022':2,'March-2022':3,'April-2022':4,'May-2022':5,'June-2022':6,'July-2022':7,'August-2022':8,'September-2022':9,'October-2022':10,'November-2022':11}
然后通过排序值映射它:
df3.sort_values(by=['Month'], key=lambda x: x.map(months)).groupby(['Customer','Product','Month','Date']).agg({'status':'first'}).reset_index()
但是我在没有正确顺序的情况下得到了完全相同的结果
最佳答案
问题在于它正在对字符串进行排序,而 April
位于 March
之前。您必须先将字符串转换为日期,然后对条目进行排序。例如这样:
# Convert column Month to datetime
df3['Month'] = pd.to_datetime(df3['Month'], format='%B-%Y')
# Do your groupby
df_group = df3.sort_values('Month').groupby(['Customer','Product','Month','Date'], sort=False).first().reset_index()
# Convert column Month back to string
df_group['Month'] = df_group['Month'].dt.strftime('%B-%Y')
df_group
输出:
Customer Product Month Date status
0 Sara grocery December-2021 15-12-2021 purchased
1 Didi electronics January-2022 1-1-2022 purchased
2 John electronics January-2022 1-1-2022 refunded
3 Didi personal-care March-2022 1-3-2022 refunded
4 Sara grocery March-2022 15-3-2022 refunded
5 Didi personal-care April-2022 1-4-2022 refunded
关于python - 在 pandas 中对 groupby 内的类别值进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71352723/