我的数据框:
display_name security_type1 currency_str state
A GOVT USD Done
B CORP NZD Passed
B CORP USD Done
C CORP EUR Done
C CORP EUR Traded Away
C CORP GBP Done
C CORP GBP Done
C CORP USD Done
我想要的结果是:
a.按 display_name
、security_type1
和 currency_str
进行分组
b.然后计算 column state
包含 Done
的行数并更新列 Done_RFQ
c.显示每个 display_name
、security_type1
和 currency_str
组合的总行数并更新列 Total_RFQ
d.最后显示完成占总计数的百分比,即 Done_Pct = Done_RFQ/Total_RFQ
display_name security_type1 currency_str Done_RFQ Total_RFQ Done_Pct
A GOVT USD 1 1 100%
B CORP USD 1 2 50%
C CORP EUR 1 5 20%
C CORP GBP 2 5 40%
C CORP USD 1 5 20%
我的代码适用于除 Total_RFQ
之外的情况,因此 Done_Pct
也适用
d = [('Done_RFQ', 'size')]
df_Done_Client = df[
df['state'].str.contains('Done')
][['display_name','security_type1','currency_str','state']].copy()
df_Done_Client =
df_Done_Client.groupby(['display_name','security_type1','currency_str'])['state'].agg(d).reset_index()
# Sum of all Done RFQ's per display_name
Sum_of_Done_For_Month = df_Done_Client.groupby('display_name')['Done_RFQ'].transform('sum')
df_Done_Client['Total_Done_RFQ'] = Sum_of_Done_For_Month
df_Done_Client['Done_Pct'] = df_Done_Client['Done_RFQ_For_Month'].div(Sum_of_Done_For_Month).round(5)
display(df_Done_Client)
我不清楚如何计算这个总数,因为它需要来自另一个数据帧,即相同的字段,但没有“完成”标准。
df_All_Client = df[['display_name','security_type1','currency_str','state']].copy()
最佳答案
我认为需要 Total_RFQ
列,其中 size
- 总计数和 Done_RFQ
按 bool 掩码计数 - 与 Done
进行比较code> 和 True
的 sum
:
d = [('Total_RFQ', 'size'), ('Done_RFQ', lambda x: x.eq('Done').sum())]
df=df.groupby(['display_name','security_type1','currency_str'])['state'].agg(d).reset_index()
df['Done_Pct'] = df['Done_RFQ'] / df['Total_RFQ'] * 100
print (df)
display_name security_type1 currency_str Total_RFQ Done_RFQ Done_Pct
0 A GOVT USD 1 1 100.0
1 B CORP NZD 1 0 0.0
2 B CORP USD 1 1 100.0
3 C CORP EUR 2 1 50.0
4 C CORP GBP 2 2 100.0
5 C CORP USD 1 1 100.0
如果需要检查子字符串:
d = [('Total_RFQ', 'size'), ('Done_RFQ', lambda x: x.str.contains('Done').sum())]
df=df.groupby(['display_name','security_type1','currency_str'])['state'].agg(d).reset_index()
df['Done_Pct'] = df['Done_RFQ'] / df['Total_RFQ'] * 100
print (df)
display_name security_type1 currency_str Total_RFQ Done_RFQ Done_Pct
0 A GOVT USD 1 1 100.0
1 B CORP NZD 1 0 0.0
2 B CORP USD 1 1 100.0
3 C CORP EUR 2 1 50.0
4 C CORP GBP 2 2 100.0
5 C CORP USD 1 1 100.0
关于python - Pandas Groupby : Aggregations on the same column but totals based on two different critera/dataframes,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51241508/