python - 根据自定义函数聚合数据框中的多列

下午好

我已经尝试解决这个问题一段时间了，我们将不胜感激。

这是我的数据框:

Channel state       rfq_qty
A        Done       10
B        Tied Done  10
C        Done       10
C        Done       10
C        Done       10
C        Tied Done  10
B        Done       10
B        Done       10

I would like to:

Group by channel, then state

Sum the rfq_qty for each channel

Count the occurences of each 'done' string in state ('Done' is treated the same as 'Tied Done' i.e. anything with 'done' in it)

Display the channels rfq_qty as a percentage of the total number of rfq_qty (80)

Channel state   rfq_qty Percentage
A         1       10    0.125
B         3       30    0.375
C         4       40    0.5

I have attempted this with the following:

df_Done = df[
                (
                    df['state']=='Done'
                ) 
                | 
                (
                    df['state'] == 'Tied Done'
                )
            ][['Channel','state','rfq_qty']]

df_Done['Percentage_Qty']= df_Done['rfq_qty']/df_Done['rfq_qty'].sum()
df_Done['Done_Trades']= df_Done['state'].count()

display(
        df_Done[
                (df_Done['Channel'] != 0)
               ].groupby(['Channel'])['Channel','Count of Done','rfq_qty','Percentage_Qty'].sum().sort_values(['rfq_qty'], ascending=False)
       )

Works but looks convoluted. Any improvements?

最佳答案

我认为你可以使用:

首先按 isin 过滤和 loc
groupby并按 agg 聚合带有新列名和函数的元组
添加百分比除以div和总和
必要时最后sort_values通过 rfq_qty

df_Done = df.loc[df['state'].isin(['Done', 'Tied Done']), ['Channel','state','rfq_qty']]

#if want filter all values contains Done
#df_Done = df[df['state'].str.contains('Done')]

#if necessary filter out Channel == 0
#mask = (df['Channel'] != 0) & df['state'].isin(['Done', 'Tied Done'])
#df_Done = df.loc[mask, ['Channel','state','rfq_qty']]

d = {('rfq_qty', 'sum'), ('Done_Trades','size')}
df = df_Done.groupby('Channel')['rfq_qty'].agg(d).reset_index()
df['Percentage'] = df['rfq_qty'].div(df['rfq_qty'].sum())
df = df.sort_values('rfq_qty')
print (df)
  Channel  Done_Trades  rfq_qty  Percentage
0       A            1       10       0.125
1       B            3       30       0.375
2       C            4       40       0.500

关于python - 根据自定义函数聚合数据框中的多列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49272452/

python - 根据自定义函数聚合数据框中的多列

上一篇：python - 在python中读/写txt文件后文件大小发生变化

下一篇：python 到 pyspark，转换 pyspark 中的枢轴