python - 在 pandas 中对 groupby 内的类别值进行排序

标签 python pandas numpy sorting

我有这个例子 df:

   df3 = pd.DataFrame({'Customer':['Sara','John','Didi','Sara','Didi' ,'Didi'],

                   'Date': ['15-12-2021', '1-1-2022' , '1-3-2022','15-3-2022', '1-1-2022' , '1-4-2022'],
                   'Month': ['December-2021', 'January-2022', 'March-2022','March-2022', 'January-2022', 'April-2022'],
                   'Product': ['grocery','electronics','personal-care','grocery','electronics','personal-care'],
                   'status': ['purchased', 'refunded', 'refunded','refunded', 'purchased', 'refunded']
                   })

df3

给出:

Customer    Date        Month            Product          status
0   Sara    15-12-2021  December-2021    grocery          purchased
1   John    1-1-2022    January-2022     electronics      refunded
2   Didi    1-3-2022    March-2022       personal-care    refunded
3   Sara    15-3-2022   March-2022       grocery          refunded
4   Didi    1-1-2022    January-2022     electronics      purchased
5   Didi    1-4-2022    April-2022       personal-care    refunded

我正在尝试按客户、产品和月份进行分组并获取第一个状态,然后我希望分组按月份列排序:

df3.sort_values('Month').groupby(['Customer','Product','Month','Date']).agg({'status':'first'}).reset_index()

我得到了:

    Customer    Product         Month             Date         status
0   Didi    electronics         January-2022      1-1-2022     purchased
1   Didi    personal-care       April-2022        1-4-2022     refunded
2   Didi    personal-care       March-2022        1-3-2022     refunded
3   John    electronics         January-2022      1-1-2022     refunded
4   Sara    grocery             December-2021     15-12-2021   purchased
5   Sara    grocery             March-2022    15-3-2022   refunded

我预计索引 1 和 2 的顺序会颠倒,三月在四月之前,所以我尝试做的是:

months = {'December-2021':0,'January-2022':1,'February-2022':2,'March-2022':3,'April-2022':4,'May-2022':5,'June-2022':6,'July-2022':7,'August-2022':8,'September-2022':9,'October-2022':10,'November-2022':11}

然后通过排序值映射它:

df3.sort_values(by=['Month'], key=lambda x: x.map(months)).groupby(['Customer','Product','Month','Date']).agg({'status':'first'}).reset_index()

但是我在没有正确顺序的情况下得到了完全相同的结果

最佳答案

问题在于它正在对字符串进行排序,而 April 位于 March 之前。您必须先将字符串转换为日期,然后对条目进行排序。例如这样:

# Convert column Month to datetime
df3['Month'] = pd.to_datetime(df3['Month'], format='%B-%Y')

# Do your groupby
df_group = df3.sort_values('Month').groupby(['Customer','Product','Month','Date'], sort=False).first().reset_index()

# Convert column Month back to string
df_group['Month'] = df_group['Month'].dt.strftime('%B-%Y')
df_group

输出:

Customer    Product Month   Date    status
0   Sara    grocery December-2021   15-12-2021  purchased
1   Didi    electronics January-2022    1-1-2022    purchased
2   John    electronics January-2022    1-1-2022    refunded
3   Didi    personal-care   March-2022  1-3-2022    refunded
4   Sara    grocery March-2022  15-3-2022   refunded
5   Didi    personal-care   April-2022  1-4-2022    refunded

关于python - 在 pandas 中对 groupby 内的类别值进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71352723/

相关文章:

python - CPython:为什么字符串的+ =会更改字符串变量的ID

python - Pandas read_excel 返回 'not enough values to unpack (expected 2, got 1)'

loops - 循环遍历两个 Pandas 数据框

python - 使用 ArrayLike 时出现 Mypy 错误

python - 翻转图像后,OpenCV putText不起作用

python - 接收 Google Calendar API 通知时出现 403 错误

python - beautifulSoup html csv

python - Opencv-在python中将图像转换为任意位深度

Python pandas - pd.melt 一个带有日期时间索引的数据框导致 NaN

python - numpy 二进制光栅图像到多边形转换