python - 根据自定义函数聚合数据框中的多列

标签 python pandas dataframe group-by summary

下午好

我已经尝试解决这个问题一段时间了,我们将不胜感激。

这是我的数据框:

Channel state       rfq_qty
A        Done       10
B        Tied Done  10
C        Done       10
C        Done       10
C        Done       10
C        Tied Done  10
B        Done       10
B        Done       10

I would like to:

  1. Group by channel, then state
  2. Sum the rfq_qty for each channel
  3. Count the occurences of each 'done' string in state ('Done' is treated the same as 'Tied Done' i.e. anything with 'done' in it)
  4. Display the channels rfq_qty as a percentage of the total number of rfq_qty (80)
Channel state   rfq_qty Percentage
A         1       10    0.125
B         3       30    0.375
C         4       40    0.5

I have attempted this with the following:

df_Done = df[
                (
                    df['state']=='Done'
                ) 
                | 
                (
                    df['state'] == 'Tied Done'
                )
            ][['Channel','state','rfq_qty']]

df_Done['Percentage_Qty']= df_Done['rfq_qty']/df_Done['rfq_qty'].sum()
df_Done['Done_Trades']= df_Done['state'].count()

display(
        df_Done[
                (df_Done['Channel'] != 0)
               ].groupby(['Channel'])['Channel','Count of Done','rfq_qty','Percentage_Qty'].sum().sort_values(['rfq_qty'], ascending=False)
       )

Works but looks convoluted. Any improvements?

最佳答案

我认为你可以使用:

  • 首先按 isin 过滤和 loc
  • groupby并按 agg 聚合带有新列名和函数的元组
  • 添加百分比除以div总和
  • 必要时最后sort_values通过 rfq_qty

df_Done = df.loc[df['state'].isin(['Done', 'Tied Done']), ['Channel','state','rfq_qty']]

#if want filter all values contains Done
#df_Done = df[df['state'].str.contains('Done')]

#if necessary filter out Channel == 0
#mask = (df['Channel'] != 0) & df['state'].isin(['Done', 'Tied Done'])
#df_Done = df.loc[mask, ['Channel','state','rfq_qty']]

d = {('rfq_qty', 'sum'), ('Done_Trades','size')}
df = df_Done.groupby('Channel')['rfq_qty'].agg(d).reset_index()
df['Percentage'] = df['rfq_qty'].div(df['rfq_qty'].sum())
df = df.sort_values('rfq_qty')
print (df)
  Channel  Done_Trades  rfq_qty  Percentage
0       A            1       10       0.125
1       B            3       30       0.375
2       C            4       40       0.500

关于python - 根据自定义函数聚合数据框中的多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49272452/

相关文章:

python - 通过比较原因错误进行属性过滤

python - 计算 Python 列表中的出现次数

python - 类定义中 self.__dict__ = self 的含义

python - Pandas 数据框线图 : Show Random Markers

python - 在 pandas DataFrame 多索引函数之外解压列表

python - 合并两个 numpy 数组并删除重复项?

pandas - 检查连续日期之间满足相同条件的 N 列,并返回每组的列数和 ID

python - pandas DataFrame 在每组的基础上插值/重采样每日数据

python - 基于过滤器更改 Dataframe 列的值

python - 解释 “Traceback (most recent call last):”错误