是否有一种 Pythonic 和有效的方法来检查 Numpy 数组是否包含给定行的至少一个实例? “高效”是指它在找到第一个匹配行时终止,而不是遍历整个数组,即使已经找到结果。
对于 Python 数组,这可以通过 if row in array:
非常干净地完成,但这并不像我对 Numpy 数组所期望的那样工作,如下所示。
使用 Python 数组:
>>> a = [[1,2],[10,20],[100,200]]
>>> [1,2] in a
True
>>> [1,20] in a
False
但是 Numpy 数组给出了不同且相当奇怪的结果。 (ndarray
的 __contains__
方法似乎没有记录。)
>>> a = np.array([[1,2],[10,20],[100,200]])
>>> np.array([1,2]) in a
True
>>> np.array([1,20]) in a
True
>>> np.array([1,42]) in a
True
>>> np.array([42,1]) in a
False
最佳答案
你可以使用 .tolist()
>>> a = np.array([[1,2],[10,20],[100,200]])
>>> [1,2] in a.tolist()
True
>>> [1,20] in a.tolist()
False
>>> [1,20] in a.tolist()
False
>>> [1,42] in a.tolist()
False
>>> [42,1] in a.tolist()
False
或者使用 View :
>>> any((a[:]==[1,2]).all(1))
True
>>> any((a[:]==[1,20]).all(1))
False
或者通过 numpy 列表生成(可能非常慢):
any(([1,2] == x).all() for x in a) # stops on first occurrence
或者使用numpy逻辑函数:
any(np.equal(a,[1,2]).all(1))
如果你计时:
import numpy as np
import time
n=300000
a=np.arange(n*3).reshape(n,3)
b=a.tolist()
t1,t2,t3=a[n//100][0],a[n//2][0],a[-10][0]
tests=[ ('early hit',[t1, t1+1, t1+2]),
('middle hit',[t2,t2+1,t2+2]),
('late hit', [t3,t3+1,t3+2]),
('miss',[0,2,0])]
fmt='\t{:20}{:.5f} seconds and is {}'
for test, tgt in tests:
print('\n{}: {} in {:,} elements:'.format(test,tgt,n))
name='view'
t1=time.time()
result=(a[...]==tgt).all(1).any()
t2=time.time()
print(fmt.format(name,t2-t1,result))
name='python list'
t1=time.time()
result = True if tgt in b else False
t2=time.time()
print(fmt.format(name,t2-t1,result))
name='gen over numpy'
t1=time.time()
result=any((tgt == x).all() for x in a)
t2=time.time()
print(fmt.format(name,t2-t1,result))
name='logic equal'
t1=time.time()
np.equal(a,tgt).all(1).any()
t2=time.time()
print(fmt.format(name,t2-t1,result))
你可以看到hit or miss,numpy的例程搜索数组的速度是一样的。 Python in
运算符可能在早期命中时要快得多,如果你必须一直遍历数组,生成器就是个坏消息。
以下是 300,000 x 3 元素数组的结果:
early hit: [9000, 9001, 9002] in 300,000 elements:
view 0.01002 seconds and is True
python list 0.00305 seconds and is True
gen over numpy 0.06470 seconds and is True
logic equal 0.00909 seconds and is True
middle hit: [450000, 450001, 450002] in 300,000 elements:
view 0.00915 seconds and is True
python list 0.15458 seconds and is True
gen over numpy 3.24386 seconds and is True
logic equal 0.00937 seconds and is True
late hit: [899970, 899971, 899972] in 300,000 elements:
view 0.00936 seconds and is True
python list 0.30604 seconds and is True
gen over numpy 6.47660 seconds and is True
logic equal 0.00965 seconds and is True
miss: [0, 2, 0] in 300,000 elements:
view 0.00936 seconds and is False
python list 0.01287 seconds and is False
gen over numpy 6.49190 seconds and is False
logic equal 0.00965 seconds and is False
对于 3,000,000 x 3 数组:
early hit: [90000, 90001, 90002] in 3,000,000 elements:
view 0.10128 seconds and is True
python list 0.02982 seconds and is True
gen over numpy 0.66057 seconds and is True
logic equal 0.09128 seconds and is True
middle hit: [4500000, 4500001, 4500002] in 3,000,000 elements:
view 0.09331 seconds and is True
python list 1.48180 seconds and is True
gen over numpy 32.69874 seconds and is True
logic equal 0.09438 seconds and is True
late hit: [8999970, 8999971, 8999972] in 3,000,000 elements:
view 0.09868 seconds and is True
python list 3.01236 seconds and is True
gen over numpy 65.15087 seconds and is True
logic equal 0.09591 seconds and is True
miss: [0, 2, 0] in 3,000,000 elements:
view 0.09588 seconds and is False
python list 0.12904 seconds and is False
gen over numpy 64.46789 seconds and is False
logic equal 0.09671 seconds and is False
这似乎表明 np.equal
是最快的纯 numpy 方法...
关于python - 测试 Numpy 数组是否包含给定的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14766194/