我正在分析包含多列的 Excel 数据。我从我正在分析的那些专栏中提取了内容。根据现有专栏的一些条件,我想创建一些新专栏。
首先,我的示例数据框如下:
df = pd.DataFrame()
df['Match'] = ['A','A','A','A','A','B','B','B','B','B',]
df['HomeGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df['AwayGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df ['AOS'] = [0.12,0.12,0.12,0.12,0.12,0.06,0.06,0.06,0.06,0.06]
df ['% Prob'] = [0.15,0.12,0.10,0.08,0.05,0.18,0.15,0.10,0.08,0.05]
数据框包含Match、HomeGoal、AwayGoal、AOS和% Prob强>.
我想创建以下列
Homegoal <1
HomeGoal <2
HomeGoal <3
HomeGoal >=1
HomeGoal >=2
HomeGoal >=3
每列包含满足以下条件的 % prob 的总和:
Homegoal <1 ==> sum of the colums % Prob where Homegoal less than 1
HomeGoal <2 ==> sum of the colums % Prob where Homegoal less than 2
HomeGoal <3 ==> sum of the colums % Prob where Homegoal less than 3
HomeGoal >=1 ==> sum of the colums % Prob and AOS where Homegoal 1 goals and above
HomeGoal >=2 ===> sum of the colums % Prob and AOS where Homegoal 2 goals and above
HomeGoal >=3 ==> sum of the colums % Prob and AOS where Homegoal 2 goals and above
上述所有计算均基于每场比赛。
我可以得到你的建议吗?
我已附上预期结果如下:
最佳答案
用途:
L = [1,2,3]
for v in L:
#new column name
col = 'HG>={}'.format(v)
#filter by condition
df1 = df[df['HomeGoal'] >= v]
#new Series filled by aggregated values per groups and added column AOS
df[col] = df1.groupby('Match')['% Prob'].transform('sum') + df['AOS']
#only first non missing value per group
mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match'])
df[col] = df[col].mask(~mask, 0)
for v in L:
col = 'HG>{}'.format(v)
df[col] = df[df['HomeGoal'] < v].groupby('Match')['% Prob'].transform('sum')
mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match'])
df[col] = df[col].mask(~mask, 0)
<小时/>
print (df)
Match HomeGoal AwayGoal AOS % Prob HG>=1 HG>=2 HG>=3 HG>1 HG>2 \
0 A 0 0 0.12 0.15 0.00 0.00 0.00 0.15 0.27
1 A 1 1 0.12 0.12 0.47 0.00 0.00 0.00 0.00
2 A 2 2 0.12 0.10 0.00 0.35 0.00 0.00 0.00
3 A 3 3 0.12 0.08 0.00 0.00 0.25 0.00 0.00
4 A 4 4 0.12 0.05 0.00 0.00 0.00 0.00 0.00
5 B 0 0 0.06 0.18 0.00 0.00 0.00 0.18 0.33
6 B 1 1 0.06 0.15 0.44 0.00 0.00 0.00 0.00
7 B 2 2 0.06 0.10 0.00 0.29 0.00 0.00 0.00
8 B 3 3 0.06 0.08 0.00 0.00 0.19 0.00 0.00
9 B 4 4 0.06 0.05 0.00 0.00 0.00 0.00 0.00
HG>3
0 0.37
1 0.00
2 0.00
3 0.00
4 0.00
5 0.43
6 0.00
7 0.00
8 0.00
9 0.00
关于python - 基于列中的约束处理 pandas 数据框中的聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53809942/