比较两列中的值并提取数据框中第三列的值
df=
预期输出 df =
最佳答案
示例代码
data = {'Location': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 3},
'teams': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'B'},
'goals': {0: 5, 1: 6, 2: 7, 3: 5, 4: 6, 5: 7}}
df = pd.DataFrame(data)
第一
使用groupby
进行聚合
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))
输出:
teams A B C
count sum count sum count sum
Location
1 1.0 5.0 1.0 6.0 NaN NaN
2 1.0 7.0 1.0 5.0 1.0 6.0
3 NaN NaN 1.0 7.0 NaN NaN
第二
让我们创建 idx
来更改列
idx = pd.MultiIndex.from_product([df['teams'].unique(), ['Team', 'Team Goal']]).map(lambda x: ' '.join(x))
idx
Index(['A Team', 'A Team Goal', 'B Team', 'B Team Goal', 'C Team', 'C Team Goal'], dtype='object')
最后
更改列和reset_index
(包括第一个代码)
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)
.set_axis(idx, axis=1).reset_index())
输出
Location A Team A Team Goal B Team B Team Goal C Team C Team Goal
0 1 1.0 5.0 1.0 6.0 NaN NaN
1 2 1.0 7.0 1.0 5.0 1.0 6.0
2 3 NaN NaN 1.0 7.0 NaN NaN
关于python - 比较两列中的值并提取数据框中第三列的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74750053/