python - bool 列上的条件 DataFrame 过滤器？

如果我有这样的 DataFrame:

| id     | attribute_1 | attribute_2 |
|--------|-------------|-------------|
| 123abc | TRUE        | TRUE        |
| 123abc | TRUE        | FALSE       |
| 456def | TRUE        | FALSE       |
| 789ghi | TRUE        | TRUE        |
| 789ghi | FALSE       | FALSE       |
| 789ghi | FALSE       | FALSE       |

如何应用groupby或一些等效的过滤器来计算DataFrame子集中id元素的唯一数量，如下所示:

| id     | attribute_1 | attribute_2 |
|--------|-------------|-------------|
| 123abc | TRUE        | TRUE        |
| 123abc | TRUE        | FALSE       |

意思是，我想获取给定 id 的所有实例的 id 值的唯一数量，其中 attribute_1 == True >但是attribute_2必须至少有1个True。

因此，456def 不会包含在过滤器中，因为它的 attribute_2 至少没有一个 True。

789ghi 不会包含在过滤器中，因为它的所有 attribute_1 条目都不是 True。

最佳答案

您需要groupby两次，一次使用transform('all')对“attribute_1”进行分组，第二次使用transform('any' ) 在“attribute_2”上。

i = df[df.groupby('id').attribute_1.transform('all')]
j = i[i.groupby('id').attribute_2.transform('any')]

print (j)
       id  attribute_1  attribute_2
0  123abc         True         True
1  123abc         True        False

最后，要获取满足此条件的唯一 ID，请调用 nunique:

print (j['id'].nunique())
1

当您的 attribute_* 列是 bool 值时，这是最容易做到的。如果它们是字符串，请先修复它们:

df = df.replace({'TRUE': True, 'FALSE': False})

关于python - bool 列上的条件 DataFrame 过滤器？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52284417/

上一篇：python - Pygame - 碰撞蒙版

下一篇：python - Airflow 1.10 : Wrong logfile path

相关文章：

python - 从 pandas 列动态创建字符串

python - 有没有更有效的方法将此电话号码转换为纯文本？

python - 无法通过旧的 secure_auth 连接到 MySQL

python - 就地更改 Pandas 系列/数据框列的类型

python - 当我期待一个系列时，Pandas DataFrame 列另一个 DataFrame

Python:将初始 numpy 数组分配给另一个变量，更改新变量的几个元素会更改初始 numpy 数组

python - 将大型稀疏矩阵转换为 COO 时出错

python - 如何遍历数据框中的列并同时更新两个新列？

python - Pandas DataFrame - 有效地计算值之间的行数

python - 使用内部 werkzeug 开发服务器进行 flask 部署