考虑数组a
a = np.array([3, 3, np.nan, 3, 3, np.nan])
我能做到
np.isnan(a).argmax()
但这需要找到所有 np.nan
才能找到第一个。
有没有更有效的方法?
我一直在尝试弄清楚我是否可以将参数传递给 np.argpartition
,这样 np.nan
就会排在最前面而不是最后。
关于 [dup] 的编辑。
这个问题之所以不同,有几个原因。
- 该问题和答案涉及值(value)观的平等。这是关于
isnan
的。 - 这些答案都遇到了我的答案所面临的同样问题。请注意,我提供了一个完全有效的答案,但强调了它的效率低下。我希望解决效率低下的问题。
关于第二个 [dup] 的编辑。
仍在解决平等问题,问题/答案已经陈旧,很可能已经过时。
最佳答案
numba.jit
可能也值得研究;没有它,矢量化版本可能会在大多数情况下击败直接的纯 Python 搜索,但编译代码后,普通搜索将领先,至少在我的测试中是这样:
In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [70]: %paste
import numba
def naive(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
def short(a):
return np.isnan(a).argmax()
@numba.jit
def naive_jit(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
@numba.jit
def short_jit(a):
return np.isnan(a).argmax()
## -- End pasted text --
In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop
In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 µs per loop
In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 µs per loop
In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop
编辑:正如@hpaulj 在他们的回答中所指出的,numpy
实际上附带了一个优化的短路搜索,其性能与上面的 JITted 搜索相当:
In [26]: %paste
def plain(a):
return a.argmax()
@numba.jit
def plain_jit(a):
return a.argmax()
## -- End pasted text --
In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop
In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 µs per loop
In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 µs per loop
In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 µs per loop
关于python - 找到第一个 np.nan 值位置的最有效方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41320568/