python - 高效循环 numpy 数组

这个问题的版本已经被问过，但我还没有找到满意的答案。

问题:给定一个大的 numpy 向量，找到重复的向量元素的索引(其变体可以与公差进行比较)。

所以问题是 ~O(N^2) 和内存限制(至少从当前算法的角度来看)。我想知道为什么我尝试过的 Python 比等效的 C 代码慢 100 倍或更多。

import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = []
print("init done")
counter = 0
for i in range(N):
    for j in range(i+1, N):
        if vect[i] == vect[j]:
            dupl.append(j)
            counter += 1

print("counter =", counter)
print(dupl)
# For simplicity, this code ignores repeated indices 
# which can be trimmed later. Ref output is
# counter = 3
# [2500, 5000, 5000]

我尝试使用 numpy 迭代器，但它们更糟糕 (~ x4-5) http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

使用 N=10,000 我在 C 中获得 0.1 秒，在 Python 中获得 12 秒(上面的代码)，在 Python 中使用 np.nditer 获得 40 秒，在 Python 中使用 np.ndindex 获得 50 秒。我将其推到 N=160,000，时间比例按预期缩放为 N^2。

最佳答案

由于答案已经停止，而且没有一个是完全令人满意的，为了记录在案，我发布了我自己的解决方案。

据我了解，在这种情况下是赋值导致 Python 变慢，而不是我最初认为的嵌套循环。使用库或编译代码消除了分配的需要，性能显着提高。

from __future__ import print_function
import numpy as np
from numba import jit

N = 10000
vect = np.arange(N, dtype=np.float32)

vect[N/2] = 1
vect[N/4] = 1
dupl = np.zeros(N, dtype=np.int32)

print("init done")
# uncomment to enable compiled function
#@jit
def duplicates(i, counter, dupl, vect):
    eps = 0.01
    ns = len(vect)
    for j in range(i+1, ns):
        # replace if to use approx comparison
        #if abs(vect[i] - vect[j]) < eps:
        if vect[i] == vect[j]:
            dupl[counter] = j
            counter += 1
    return counter

counter = 0
for i in xrange(N):
    counter = duplicates(i, counter, dupl, vect)

print("counter =", counter)
print(dupl[0:counter])

测试

# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 10.135 s

# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 0.480 s

编译版本(@jit 未注释)的性能接近 C 代码性能 ~0.1 - 0.2 秒。也许消除最后一个循环可以进一步提高性能。使用 eps 进行近似比较时，性能差异更大，而编译版本差异很小。

# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 109.218 s

# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 0.506 s

这是大约 200 倍的差异。在实际代码中，我必须将两个循环都放在函数中，并使用具有变量类型的函数模板，因此它有点复杂，但不是很复杂。

关于python - 高效循环 numpy 数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39371021/

python - 高效循环 numpy 数组

上一篇：python - 如何在嵌套列表中获取最大字符串长度

下一篇：python - pandas read_csv 并只保留某些行(python)