python - 基于列中的约束处理 pandas 数据框中的聚合

我正在分析包含多列的 Excel 数据。我从我正在分析的那些专栏中提取了内容。根据现有专栏的一些条件，我想创建一些新专栏。

首先，我的示例数据框如下:

df = pd.DataFrame()
df['Match'] = ['A','A','A','A','A','B','B','B','B','B',]
df['HomeGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df['AwayGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df ['AOS'] = [0.12,0.12,0.12,0.12,0.12,0.06,0.06,0.06,0.06,0.06]
df ['% Prob'] = [0.15,0.12,0.10,0.08,0.05,0.18,0.15,0.10,0.08,0.05]

数据框包含Match、HomeGoal、AwayGoal、AOS和% Prob.

我想创建以下列

Homegoal <1 HomeGoal <2 HomeGoal <3 HomeGoal >=1 HomeGoal >=2 HomeGoal >=3

每列包含满足以下条件的 % prob 的总和:

Homegoal <1 ==> sum of the colums % Prob where Homegoal less than 1 HomeGoal <2 ==> sum of the colums % Prob where Homegoal less than 2 HomeGoal <3 ==> sum of the colums % Prob where Homegoal less than 3 HomeGoal >=1 ==> sum of the colums % Prob and AOS where Homegoal 1 goals and above HomeGoal >=2 ===> sum of the colums % Prob and AOS where Homegoal 2 goals and above HomeGoal >=3 ==> sum of the colums % Prob and AOS where Homegoal 2 goals and above

上述所有计算均基于每场比赛。

我可以得到你的建议吗？

我已附上预期结果如下:

最佳答案

用途:

L = [1,2,3] for v in L: #new column name col = 'HG>={}'.format(v) #filter by condition df1 = df[df['HomeGoal'] >= v] #new Series filled by aggregated values per groups and added column AOS df[col] = df1.groupby('Match')['% Prob'].transform('sum') + df['AOS'] #only first non missing value per group mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match']) df[col] = df[col].mask(~mask, 0) for v in L: col = 'HG>{}'.format(v) df[col] = df[df['HomeGoal'] < v].groupby('Match')['% Prob'].transform('sum') mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match']) df[col] = df[col].mask(~mask, 0)
<小时/>
print (df) Match HomeGoal AwayGoal AOS % Prob HG>=1 HG>=2 HG>=3 HG>1 HG>2 \ 0 A 0 0 0.12 0.15 0.00 0.00 0.00 0.15 0.27 1 A 1 1 0.12 0.12 0.47 0.00 0.00 0.00 0.00 2 A 2 2 0.12 0.10 0.00 0.35 0.00 0.00 0.00 3 A 3 3 0.12 0.08 0.00 0.00 0.25 0.00 0.00 4 A 4 4 0.12 0.05 0.00 0.00 0.00 0.00 0.00 5 B 0 0 0.06 0.18 0.00 0.00 0.00 0.18 0.33 6 B 1 1 0.06 0.15 0.44 0.00 0.00 0.00 0.00 7 B 2 2 0.06 0.10 0.00 0.29 0.00 0.00 0.00 8 B 3 3 0.06 0.08 0.00 0.00 0.19 0.00 0.00 9 B 4 4 0.06 0.05 0.00 0.00 0.00 0.00 0.00 HG>3 0 0.37 1 0.00 2 0.00 3 0.00 4 0.00 5 0.43 6 0.00 7 0.00 8 0.00 9 0.00

关于python - 基于列中的约束处理 pandas 数据框中的聚合，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53809942/

python - 基于列中的约束处理 pandas 数据框中的聚合

上一篇：python - 使用 pd.to_datetime 通过输入数据框中不同列中的年、月、日来形成日期

下一篇：python - 一行作业与多行作业有什么区别