python - 根据列值选择用户 - pandas dataframe

我在选择符合数据框中某些条件的 ID 时遇到问题。问题是这样的: 我的数据框如下所示:

index    ID    score_1   score_2   ...
   0     22      0          0
   1     22      0          0
   2     22      0          0
   3     23      1          0
   4     23      1          0 
   5     23      1          0
   6     24      0          0
   7     24      0          0
   8     24      0          1
   10    25      0          0
   11    25      0          0
   12    26      0          1
   13    26      0          1

我想要做的是获取具有以下内容的 ID 数量:

score_1 == 0 和 score_2 == 0 - 例如 ID == 22 和 ID == 25 满足此要求。
score_1 == 0，但给定 ID 的至少一行具有 score_2 == 1 - 例如 ID == 24满足此要求
score_1 == 0，并且给定 ID 的所有行都具有 score_2 == 1 - 例如 ID == 26满足这个要求

每个 ID 只能出现在其中一个组中。

我尝试使用条件过滤和groupby，但随后我得到了重复的ID，因为它只选择单行，而不是“记住”用户。我尝试过的一些代码:

# Create a df with only IDs that have score_1 == 0, group by `ID`
zero_IDs = df[df['score_1'] == 0].groupby(by = 'ID').nunique()
# 'Count' the number of IDs that have only one type of `score_2`
# But this does not differentitate between `0` or `1` values for score_2 column
zero_IDs[(zero_IDs['score_2'] == 1)].shape[0] 
# 'Count' the number of IDs that have at leat one `score_2 == 1`
zero_IDs[(zero_IDs['score_2'] > 1)].shape[0]

你能帮我解决这个问题吗？

最佳答案

这样的事情怎么样？结果为 [22 25] [24] [26]。

dfsum = df.groupby('ID').sum()
case1 = dfsum[(dfsum.score_1==0) & (dfsum.score_2==0)].index
case2 = dfsum[(dfsum.score_1==0) & (dfsum.score_2>0) &  (dfsum.score_2<df.groupby('ID').count().score_2)].index  
case3 = dfsum[(dfsum.score_1==0) & (dfsum.score_2>0) &  (dfsum.score_2==df.groupby('ID').count().score_2)].index
print(case1.values)
print(case2.values)
print(case3.values)

关于python - 根据列值选择用户 - pandas dataframe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51428084/

python - 根据列值选择用户 - pandas dataframe

上一篇：Python 记录重复输出

下一篇：Python **kwargs 循环中的修改