python - 基于列中的约束处理 pandas 数据框中的聚合

标签 python pandas dataframe

我正在分析包含多列的 Excel 数据。我从我正在分析的那些专栏中提取了内容。根据现有专栏的一些条件,我想创建一些新专栏。

首先,我的示例数据框如下:

df = pd.DataFrame()
df['Match'] = ['A','A','A','A','A','B','B','B','B','B',]
df['HomeGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df['AwayGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df ['AOS'] = [0.12,0.12,0.12,0.12,0.12,0.06,0.06,0.06,0.06,0.06]
df ['% Prob'] = [0.15,0.12,0.10,0.08,0.05,0.18,0.15,0.10,0.08,0.05]

数据框包含MatchHomeGoalAwayGoalAOS% Prob.

我想创建以下列

Homegoal <1
HomeGoal <2
HomeGoal <3
HomeGoal >=1
HomeGoal >=2
HomeGoal >=3

每列包含满足以下条件的 % prob 的总和:

Homegoal <1 ==> sum of the colums % Prob where Homegoal less than 1
HomeGoal <2 ==> sum of the colums % Prob where Homegoal less than 2
HomeGoal <3 ==> sum of the colums % Prob where Homegoal less than 3
HomeGoal >=1 ==> sum of the colums % Prob and AOS where Homegoal 1 goals and above
HomeGoal >=2 ===> sum of the colums % Prob and AOS where Homegoal 2 goals and above
HomeGoal >=3 ==> sum of the colums % Prob and AOS where Homegoal 2 goals and above

上述所有计算均基于每场比赛。

我可以得到你的建议吗?

我已附上预期结果如下:

enter image description here

最佳答案

用途:

L = [1,2,3]

for v in L:
    #new column name 
    col = 'HG>={}'.format(v)
    #filter by condition
    df1 =  df[df['HomeGoal'] >= v]
    #new Series filled by aggregated values per groups and added column AOS
    df[col] = df1.groupby('Match')['% Prob'].transform('sum') + df['AOS']
    #only first non missing value per group
    mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match']) 
    df[col] = df[col].mask(~mask, 0)

for v in L:
    col = 'HG>{}'.format(v)
    df[col] = df[df['HomeGoal'] < v].groupby('Match')['% Prob'].transform('sum')
    mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match']) 
    df[col] = df[col].mask(~mask, 0)
<小时/>
print (df)

  Match  HomeGoal  AwayGoal   AOS  % Prob  HG>=1  HG>=2  HG>=3  HG>1  HG>2  \
0     A         0         0  0.12    0.15   0.00   0.00   0.00  0.15  0.27   
1     A         1         1  0.12    0.12   0.47   0.00   0.00  0.00  0.00   
2     A         2         2  0.12    0.10   0.00   0.35   0.00  0.00  0.00   
3     A         3         3  0.12    0.08   0.00   0.00   0.25  0.00  0.00   
4     A         4         4  0.12    0.05   0.00   0.00   0.00  0.00  0.00   
5     B         0         0  0.06    0.18   0.00   0.00   0.00  0.18  0.33   
6     B         1         1  0.06    0.15   0.44   0.00   0.00  0.00  0.00   
7     B         2         2  0.06    0.10   0.00   0.29   0.00  0.00  0.00   
8     B         3         3  0.06    0.08   0.00   0.00   0.19  0.00  0.00   
9     B         4         4  0.06    0.05   0.00   0.00   0.00  0.00  0.00   

   HG>3  
0  0.37  
1  0.00  
2  0.00  
3  0.00  
4  0.00  
5  0.43  
6  0.00  
7  0.00  
8  0.00  
9  0.00  

关于python - 基于列中的约束处理 pandas 数据框中的聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53809942/

相关文章:

python - to_dict() 在值周围创建括号

python - 使用 python 删除 Excel 单元格中的换行符

python - 为数据库中现有条目生成 slug 字段数据

python - Pandas 数据框 : Count unique words in a column and return count in another column

python - 如何有效检查 pandas 数据帧每行中的连续值范围?

python - SQL/Python - 如何从另一个表返回每个属性和子属性的计数

python - 在Python中替换数据框中的值

python - 从 csv 的单元格中读取包含字典的 Pandas 数据框

python - 在 Spark 中更新数据框列

python-3.x - Python pyodbc 写入 Microsoft Azure SQL 数据库错误