python - 以系列结尾的 Pandas OR 语句包含

我有一个 DataFrame df，它有列 type 和 subtype 以及大约 100k 行，我正在尝试对哪种数据进行分类df 通过检查 type/subtype 组合包含。虽然 df 可以包含许多不同的组合，但有一些特定的组合只出现在特定的数据类型中。要检查我的对象是否包含我目前正在做的任何这些组合:

typeA = ((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8)))
A = typeA.sum()

其中 typeA 是一长串可能有一些真值的假值，如果 A > 0 那么我知道它包含一个真值。这个方案的问题在于，如果 df 的第一行产生一个 True，它仍然必须检查其他所有内容。检查整个 DataFrame 比使用带中断的 for 循环更快，但我想知道是否有更好的方法来做到这一点。

感谢您的任何建议。

最佳答案

使用Pandas crosstab :

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 10, size=(100, 2)), columns=["type", "subtype"])
counts = pd.crosstab(df.type, df.subtype)

print counts.loc[0, [2, 3, 5, 6]].sum() + counts.loc[5, [3, 4, 7, 8]].sum()

结果是一样的:

a = (((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8))))
a.sum()

关于python - 以系列结尾的 Pandas OR 语句包含，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20062684/

python - 以系列结尾的 Pandas OR 语句包含

上一篇：Python:我可以假设条件是从左到右测试并在满足时停止吗？

下一篇：python - python 2.7.x 中的继承