python - 测试 Numpy 数组是否包含给定的行

标签 python numpy

是否有一种 Pythonic 和有效的方法来检查 Numpy 数组是否包含给定行的至少一个实例? “高效”是指它在找到第一个匹配行时终止,而不是遍历整个数组,即使已经找到结果。

对于 Python 数组,这可以通过 if row in array: 非常干净地完成,但这并不像我对 Numpy 数组所期望的那样工作,如下所示。

使用 Python 数组:

>>> a = [[1,2],[10,20],[100,200]]
>>> [1,2] in a
True
>>> [1,20] in a
False

但是 Numpy 数组给出了不同且相当奇怪的结果。 (ndarray__contains__ 方法似乎没有记录。)

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> np.array([1,2]) in a
True
>>> np.array([1,20]) in a
True
>>> np.array([1,42]) in a
True
>>> np.array([42,1]) in a
False

最佳答案

你可以使用 .tolist()

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> [1,2] in a.tolist()
True
>>> [1,20] in a.tolist()
False
>>> [1,20] in a.tolist()
False
>>> [1,42] in a.tolist()
False
>>> [42,1] in a.tolist()
False

或者使用 View :

>>> any((a[:]==[1,2]).all(1))
True
>>> any((a[:]==[1,20]).all(1))
False

或者通过 numpy 列表生成(可能非常慢):

any(([1,2] == x).all() for x in a)     # stops on first occurrence 

或者使用numpy逻辑函数:

any(np.equal(a,[1,2]).all(1))

如果你计时:

import numpy as np
import time

n=300000
a=np.arange(n*3).reshape(n,3)
b=a.tolist()

t1,t2,t3=a[n//100][0],a[n//2][0],a[-10][0]

tests=[ ('early hit',[t1, t1+1, t1+2]),
        ('middle hit',[t2,t2+1,t2+2]),
        ('late hit', [t3,t3+1,t3+2]),
        ('miss',[0,2,0])]

fmt='\t{:20}{:.5f} seconds and is {}'     

for test, tgt in tests:
    print('\n{}: {} in {:,} elements:'.format(test,tgt,n))

    name='view'
    t1=time.time()
    result=(a[...]==tgt).all(1).any()
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

    name='python list'
    t1=time.time()
    result = True if tgt in b else False
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

    name='gen over numpy'
    t1=time.time()
    result=any((tgt == x).all() for x in a)
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

    name='logic equal'
    t1=time.time()
    np.equal(a,tgt).all(1).any()
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

你可以看到hit or miss,numpy的例程搜索数组的速度是一样的。 Python in 运算符可能在早期命中时要快得多,如果你必须一直遍历数组,生成器就是个坏消息。

以下是 300,000 x 3 元素数组的结果:

early hit: [9000, 9001, 9002] in 300,000 elements:
    view                0.01002 seconds and is True
    python list         0.00305 seconds and is True
    gen over numpy      0.06470 seconds and is True
    logic equal         0.00909 seconds and is True

middle hit: [450000, 450001, 450002] in 300,000 elements:
    view                0.00915 seconds and is True
    python list         0.15458 seconds and is True
    gen over numpy      3.24386 seconds and is True
    logic equal         0.00937 seconds and is True

late hit: [899970, 899971, 899972] in 300,000 elements:
    view                0.00936 seconds and is True
    python list         0.30604 seconds and is True
    gen over numpy      6.47660 seconds and is True
    logic equal         0.00965 seconds and is True

miss: [0, 2, 0] in 300,000 elements:
    view                0.00936 seconds and is False
    python list         0.01287 seconds and is False
    gen over numpy      6.49190 seconds and is False
    logic equal         0.00965 seconds and is False

对于 3,000,000 x 3 数组:

early hit: [90000, 90001, 90002] in 3,000,000 elements:
    view                0.10128 seconds and is True
    python list         0.02982 seconds and is True
    gen over numpy      0.66057 seconds and is True
    logic equal         0.09128 seconds and is True

middle hit: [4500000, 4500001, 4500002] in 3,000,000 elements:
    view                0.09331 seconds and is True
    python list         1.48180 seconds and is True
    gen over numpy      32.69874 seconds and is True
    logic equal         0.09438 seconds and is True

late hit: [8999970, 8999971, 8999972] in 3,000,000 elements:
    view                0.09868 seconds and is True
    python list         3.01236 seconds and is True
    gen over numpy      65.15087 seconds and is True
    logic equal         0.09591 seconds and is True

miss: [0, 2, 0] in 3,000,000 elements:
    view                0.09588 seconds and is False
    python list         0.12904 seconds and is False
    gen over numpy      64.46789 seconds and is False
    logic equal         0.09671 seconds and is False

这似乎表明 np.equal 是最快的纯 numpy 方法...

关于python - 测试 Numpy 数组是否包含给定的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14766194/

相关文章:

python - 如何使用 setuptools *安装*额外的包数据?

python - 计算其单元格为列表的 Pandas 数据框的平均值

python - 如何从 numpy 2d 中获取列值最大的行?

python - 将数组从 np.triu_indices 转换为对称矩阵

python - 从一组不规则的点开始,在 3D 曲面中插入 Z 值

python - 打印按钮工具栏 PySide Python

python - 删除重复的分隔符 Pandas

python - 将字符串映射到在 ASCII 表中具有位置的整数

python - 使用 PyTest 参数化测试集

python - 使用 urlencode python 构建查询字符串