test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]})
crash_drinking crash_drugs driver_drinking driver_drugged injury
0 1 0 1 0 A
1 1 0 1 0 B
2 1 0 0 0 B
3 0 1 0 0 A
4 0 1 0 1 A
5 0 0 0 0 C
6 1 0 0 0 A
7 0 1 1 1 B
8 1 1 0 0 A
我希望我的输出看起来像这样(更改列名以将它们与上面的数据框区分开来):
drinking crash drinking driver in crash drugged crash drugged driver in crash
A 2 1 2 1
B 2 1 1 0
对于第一行,"injury"= 'A'
,以及以下过滤器:
“drinking crash”是指 crash_drinking = 1
和 crash_drugs = 0
的计数;
“drinking driver in crash”是 crash_drinking = 1
,crash_drugs = 0
,driver_drinking = 1,
和 driver_drugs 0
;
“吸毒崩溃”是 crash_drinking = 0
和 crash_drugs = 1;
“车祸中吸毒的司机”是 crash_drinking = 0
、crash_drugs = 1
、driver_drinking = 0
和 driver_drugs = 1
。
B 行也一样,除了那是 "injury"= 'B' 的地方。
现在我只设置了一堆 .loc 过滤器:
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) & (test['driver_drinking'] == 1) & (test['driver_drugged'] == 0)]
等等
我宁愿通过 groupby 或 .apply() 执行此操作,因为我认为这比遍历所有这些查询更快。但我不确定这样做的正确语法。也许我应该在“伤害”列上做一个 .groupby(),然后从那里开始......?
最佳答案
result = pd.DataFrame()
result['drinking crash'] = (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
result['drinking driver in crash'] = ((test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
& (test['driver_drinking'] == 1) & (test['driver_drugs'] == 0))
result['drugged crash'] = (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
result['drugged driver in crash'] = ((test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
& (test['driver_drinking'] == 0) & (test['driver_drugs'] == 1))
result = result.astype(int)
result['injury'] = test['injury']
result.groupby('injury').sum()
关于python - Pandas :通过groupby进行复杂过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40183452/