python - Pandas :通过groupby进行复杂过滤

标签 python pandas

test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]})

   crash_drinking  crash_drugs  driver_drinking  driver_drugged injury
0               1            0                1               0      A
1               1            0                1               0      B
2               1            0                0               0      B
3               0            1                0               0      A
4               0            1                0               1      A
5               0            0                0               0      C
6               1            0                0               0      A
7               0            1                1               1      B
8               1            1                0               0      A

我希望我的输出看起来像这样(更改列名以将它们与上面的数据框区分开来):

    drinking crash  drinking driver in crash    drugged crash   drugged driver in crash
A                2                        1                 2                         1
B                2                        1                 1                         0

对于第一行,"injury"= 'A',以及以下过滤器:

“drinking crash”是指 crash_drinking = 1crash_drugs = 0 的计数;

“drinking driver in crash”是 crash_drinking = 1crash_drugs = 0driver_drinking = 1,driver_drugs 0;

“吸毒崩溃”是 crash_drinking = 0crash_drugs = 1;

“车祸中吸毒的司机”是 crash_drinking = 0crash_drugs = 1driver_drinking = 0driver_drugs = 1

B 行也一样,除了那是 "injury"= 'B' 的地方。

现在我只设置了一堆 .loc 过滤器:

test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) & (test['driver_drinking'] == 1) & (test['driver_drugged'] == 0)]

等等

我宁愿通过 groupby 或 .apply() 执行此操作,因为我认为这比遍历所有这些查询更快。但我不确定这样做的正确语法。也许我应该在“伤害”列上做一个 .groupby(),然后从那里开始......?

最佳答案

result = pd.DataFrame()
result['drinking crash'] = (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
result['drinking driver in crash'] = ((test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) 
                                      & (test['driver_drinking'] == 1) & (test['driver_drugs'] == 0))
result['drugged crash'] = (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
result['drugged driver in crash'] = ((test['crash_drinking'] == 0) & (test['crash_drugs'] == 1) 
                                     & (test['driver_drinking'] == 0) & (test['driver_drugs'] == 1))
result = result.astype(int)
result['injury'] = test['injury']
result.groupby('injury').sum()

resulting dataframe

关于python - Pandas :通过groupby进行复杂过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40183452/

相关文章:

Python 和 Socket.IO - 应用程序在连接后挂起

python - Pandas 列差异,包含列表

python - 评估 Python 中的比较运算符的异常行为

python - Pandas to_sql 没有在我的表中插入任何数据

python - 仅当值不同时才创建新列

python - pandas:查找所选列的平均值

python - pandas多组apply()更改 View 值

python - pandas 中的数据框成对乘法?

python - 如何根据键从其他数据帧中提取值并将其设置为当前数据帧

python 请求和 cx_freeze