在下面的 df 中,变量“group”中有三个组 - 'A'、'AB'、'C'。 df 中的其他列通过后缀分配给特定组 - var1_A 与组 A 相关,依此类推。
data = pd.DataFrame({'group':['A', 'AB', 'A', 'AB', 'AB', 'C', 'C', 'A', 'A', 'AB'],
'var1_A':['pass', 'fail', 'pass','fail', 'pass']*2,
'var2_A':['pass', 'pass', 'pass','fail', 'pass']*2,
'var1_AB':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_AB':['pass', 'pass', 'fail','fail', 'pass']*2,
'var1_C':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_C': ['fail', 'fail', 'fail','fail', 'pass']*2
})
我想为每一行计算“通过”发生的次数。对于属于 A 组的实例,我只想计算连接到 A 组的变量。我希望将结果放在一个新列中。这几乎可以完成工作。
data['new_col'] = data[data['group']=='A']['var1_A, var2_A].isin(['pass']).sum(1)
data['new_col'] = data[data['group']=='AB']['var1_AB, var2_AB].isin(['pass']).sum(1)
data['new_col'] = data[data['group']=='C']['var1_C, var2_C].isin(['pass']).sum(1)
但是,我希望所有组的结果都在同一列中。这个操作也许可以使用 groupby 和转换来完成?然而,我在弄清楚这一点上遇到了困难。
目标数据框:
pd.DataFrame({'group':['A', 'AB', 'A', 'AB', 'AB', 'C', 'C', 'A', 'A', 'AB'],
'var1_A':['pass', 'fail', 'pass','fail', 'pass']*2,
'var2_A':['pass', 'pass', 'pass','fail', 'pass']*2,
'var1_AB':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_AB':['pass', 'pass', 'fail','fail', 'pass']*2,
'var1_C':['pass', 'pass', 'pass','fail', 'pass']*2,
'var2_C': ['fail', 'fail', 'fail','fail', 'pass']*2,
'result':[2,2,2,0,2,1,1,2,0,2]
})
最佳答案
您可以melt
,过滤器和 groupby.count
:
data['result'] = (data
.rename(columns=lambda x: x.split('_')[-1]) # get only part after "_"
.reset_index().melt(['index', 'group'])
# keep only identical groups and "pass" values
.loc[lambda d: d['group'].eq(d['variable']) & d['value'].eq('pass')]
.groupby('index')['value'].count()
.reindex(data.index, fill_value=0)
)
print(data)
或者使用矩阵和字符串比较的另一种方法:
df2 = data.set_index('group').eq('pass')
data['result'] = (df2.mul(df2.columns.str.extract('_(\w+)', expand=False))
.eq(df2.index, axis=0).sum(axis=1)
.to_numpy()
)
输出:
group var1_A var2_A var1_AB var2_AB var1_C var2_C result
0 A pass pass pass pass pass fail 2
1 AB fail pass pass pass pass fail 2
2 A pass pass pass fail pass fail 2
3 AB fail fail fail fail fail fail 0
4 AB pass pass pass pass pass pass 2
5 C pass pass pass pass pass fail 1
6 C fail pass pass pass pass fail 1
7 A pass pass pass fail pass fail 2
8 A fail fail fail fail fail fail 0
9 AB pass pass pass pass pass pass 2
关于python - 使用每个组的不同列来计算组内的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75030573/