python - 找到第一个 np.nan 值位置的最有效方法是什么？

考虑数组a

a = np.array([3, 3, np.nan, 3, 3, np.nan])

我能做到

np.isnan(a).argmax()

但这需要找到所有 np.nan 才能找到第一个。
有没有更有效的方法？

我一直在尝试弄清楚我是否可以将参数传递给 np.argpartition，这样 np.nan 就会排在最前面而不是最后。

关于 [dup] 的编辑。
这个问题之所以不同，有几个原因。

该问题和答案涉及值(value)观的平等。这是关于 isnan 的。
这些答案都遇到了我的答案所面临的同样问题。请注意，我提供了一个完全有效的答案，但强调了它的效率低下。我希望解决效率低下的问题。

关于第二个 [dup] 的编辑。

仍在解决平等问题，问题/答案已经陈旧，很可能已经过时。

最佳答案

numba.jit 可能也值得研究；没有它，矢量化版本可能会在大多数情况下击败直接的纯 Python 搜索，但编译代码后，普通搜索将领先，至少在我的测试中是这样:

In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])

In [70]: %paste
import numba

def naive(a):
        for i in range(len(a)):
                if np.isnan(a[i]):
                        return i

def short(a):
        return np.isnan(a).argmax()

@numba.jit
def naive_jit(a):
        for i in range(len(a)):
                if np.isnan(a[i]):
                        return i

@numba.jit
def short_jit(a):
        return np.isnan(a).argmax()
## -- End pasted text --

In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop

In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 µs per loop

In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 µs per loop

In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop

编辑:正如@hpaulj 在他们的回答中所指出的，numpy 实际上附带了一个优化的短路搜索，其性能与上面的 JITted 搜索相当:

In [26]: %paste
def plain(a):
        return a.argmax()

@numba.jit
def plain_jit(a):
        return a.argmax()
## -- End pasted text --

In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop

In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 µs per loop

In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 µs per loop

In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 µs per loop

关于python - 找到第一个 np.nan 值位置的最有效方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41320568/

python - 找到第一个 np.nan 值位置的最有效方法是什么？

上一篇：python - 如何将预测序列转换回keras中的文本？

下一篇：python - Pandas - 找到第一次出现