python - 测试 Numpy 数组是否包含给定行

标签 python numpy

是否有一种 Pythonic 且有效的方法来检查 Numpy 数组是否至少包含给定行的一个实例?我所说的“高效”是指它在找到第一个匹配行时终止,而不是遍历整个数组,即使已经找到结果也是如此。

对于 Python 数组,这可以通过 if row in array: 非常干净地完成,但这并不像我对 Numpy 数组所期望的那样有效,如下图所示。

使用 Python 数组:

>>> a = [[1,2],[10,20],[100,200]]
>>> [1,2] in a
True
>>> [1,20] in a
False

但 Numpy 数组给出了不同且看起来很奇怪的结果。 (ndarray__contains__ 方法似乎没有记录。)

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> np.array([1,2]) in a
True
>>> np.array([1,20]) in a
True
>>> np.array([1,42]) in a
True
>>> np.array([42,1]) in a
False

最佳答案

你可以使用.tolist()

>>> a = np.array([[1,2],[10,20],[100,200]])
>>> [1,2] in a.tolist()
True
>>> [1,20] in a.tolist()
False
>>> [1,20] in a.tolist()
False
>>> [1,42] in a.tolist()
False
>>> [42,1] in a.tolist()
False

或者使用 View :

>>> any((a[:]==[1,2]).all(1))
True
>>> any((a[:]==[1,20]).all(1))
False

或者通过 numpy 列表生成(可能非常慢):

any(([1,2] == x).all() for x in a)     # stops on first occurrence 

或者使用 numpy 逻辑函数:

any(np.equal(a,[1,2]).all(1))

如果你计时这些:

import numpy as np
import time

n=300000
a=np.arange(n*3).reshape(n,3)
b=a.tolist()

t1,t2,t3=a[n//100][0],a[n//2][0],a[-10][0]

tests=[ ('early hit',[t1, t1+1, t1+2]),
        ('middle hit',[t2,t2+1,t2+2]),
        ('late hit', [t3,t3+1,t3+2]),
        ('miss',[0,2,0])]

fmt='\t{:20}{:.5f} seconds and is {}'     

for test, tgt in tests:
    print('\n{}: {} in {:,} elements:'.format(test,tgt,n))

    name='view'
    t1=time.time()
    result=(a[...]==tgt).all(1).any()
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

    name='python list'
    t1=time.time()
    result = True if tgt in b else False
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

    name='gen over numpy'
    t1=time.time()
    result=any((tgt == x).all() for x in a)
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

    name='logic equal'
    t1=time.time()
    np.equal(a,tgt).all(1).any()
    t2=time.time()
    print(fmt.format(name,t2-t1,result))

你可以看到命中或未命中,numpy例程搜索数组的速度是一样的。 Python in 运算符可能对于早期命中要快得多,如果您必须一直遍历数组,生成器就是个坏消息。

以下是 300,000 x 3 元素数组的结果:

early hit: [9000, 9001, 9002] in 300,000 elements:
    view                0.01002 seconds and is True
    python list         0.00305 seconds and is True
    gen over numpy      0.06470 seconds and is True
    logic equal         0.00909 seconds and is True

middle hit: [450000, 450001, 450002] in 300,000 elements:
    view                0.00915 seconds and is True
    python list         0.15458 seconds and is True
    gen over numpy      3.24386 seconds and is True
    logic equal         0.00937 seconds and is True

late hit: [899970, 899971, 899972] in 300,000 elements:
    view                0.00936 seconds and is True
    python list         0.30604 seconds and is True
    gen over numpy      6.47660 seconds and is True
    logic equal         0.00965 seconds and is True

miss: [0, 2, 0] in 300,000 elements:
    view                0.00936 seconds and is False
    python list         0.01287 seconds and is False
    gen over numpy      6.49190 seconds and is False
    logic equal         0.00965 seconds and is False

对于 3,000,000 x 3 阵列:

early hit: [90000, 90001, 90002] in 3,000,000 elements:
    view                0.10128 seconds and is True
    python list         0.02982 seconds and is True
    gen over numpy      0.66057 seconds and is True
    logic equal         0.09128 seconds and is True

middle hit: [4500000, 4500001, 4500002] in 3,000,000 elements:
    view                0.09331 seconds and is True
    python list         1.48180 seconds and is True
    gen over numpy      32.69874 seconds and is True
    logic equal         0.09438 seconds and is True

late hit: [8999970, 8999971, 8999972] in 3,000,000 elements:
    view                0.09868 seconds and is True
    python list         3.01236 seconds and is True
    gen over numpy      65.15087 seconds and is True
    logic equal         0.09591 seconds and is True

miss: [0, 2, 0] in 3,000,000 elements:
    view                0.09588 seconds and is False
    python list         0.12904 seconds and is False
    gen over numpy      64.46789 seconds and is False
    logic equal         0.09671 seconds and is False

这似乎表明 np.equal 是最快的纯 numpy 方法...

关于python - 测试 Numpy 数组是否包含给定行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38303381/

相关文章:

python - 在大小为 n*n python 的 ndarray 中接收到大于 x 的元素的所有索引

python - 我可以在删除每个组中的第一个和最后一个条目的同时使用 Pandas group by 吗?

python - 不同大小数组的 Numpy 运算

python - 使用 Pandas Dataframe 制作调色板

Python - 信号量获取可以从特定线程解锁吗?

python - 检查跨二维数组的滑动窗口中的所有元素是否为 True - Python

python - 你如何在 Numpy 中获得向量的大小?

python - 执行操作后如何从 numpy 数组中屏蔽 "remove"?

python - 如何使用 twisted.web.client.Agent 及其子类为代码编写测试?

python - 我怎样才能懒惰地构建一个列表?