python - Pandas 中 boolean 索引的逻辑运算符

我正在 Pandas 中使用 boolean 索引。

问题是为什么这样说:

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常，而

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

退出时出错？

示例:

a = pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

最佳答案

当你说

(a['x']==1) and (a['y']==10)

您隐式要求 Python 将 (a['x']==1) 和 (a['y']==10) 转换为 boolean 值.

NumPy 数组(长度大于 1)和 Pandas 对象(例如 Series)没有 boolean 值 - 换句话说，它们会引发

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

用作 boolean 值时。那是因为它是unclear when it should be True or False 。如果它们具有非零长度(例如 Python 列表)，一些用户可能会认为它们是 True。其他人可能希望仅当其所有元素都为 True 时才为 True。如果它的任何元素为 True，其他人可能希望它为 True。

由于存在太多相互冲突的期望，NumPy 和 Pandas 的设计者拒绝猜测，而是引发 ValueError。

相反，您必须明确地通过调用 empty()、all() 或 any() 方法来指示哪种行为你想要的。

但是，在这种情况下，您似乎不需要 boolean 计算，您需要逐元素逻辑与。这就是 & 二元运算符执行的操作:

(a['x']==1) & (a['y']==10)

返回一个 boolean 数组。

<小时/>

顺便说一下，如alexpmil notes , 括号是强制性的，因为 & 具有更高的 operator precedence比==。

如果没有括号，a['x']==1 & a['y']==10 将被计算为 a['x'] == (1 & a['y']) == 10 这又相当于链式比较 (a['x'] == (1 & a['y'])) 和 ( (1 & a['y']) == 10)。这是Series 和Series 形式的表达式。对两个系列使用 and 将再次触发与上面相同的 ValueError。这就是为什么括号是强制性的。

关于python - Pandas 中 boolean 索引的逻辑运算符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59104587/

python - Pandas 中 boolean 索引的逻辑运算符

上一篇：python - 排除周末的两天之间的差异(以小时为单位)

下一篇：python - 从 OpenCV python 读取图像后提供图像后出现类型错误