python - 在 python/numpy 中优化矩阵写入

我目前正在尝试优化一段代码，其要点是我们通过并计算一堆值并将它们写入矩阵。计算顺序无关紧要:

mat =  np.zeros((n, n))
mat.fill(MAX_VAL)
for i in xrange(0, smallerDim):
    for j in xrange(0,n):
        similarityVal = doACalculation(i,j, data, cache)
        mat[i][j] = abs(1.0 / (similarityVal + 1.0))

我分析了这段代码，发现大约 90% 的时间花在了将值写回矩阵(最后一行)上

我想知道执行此类计算以优化写入的最佳方法是什么。我应该写入中间缓冲区并在整行中复制等等。我对性能调整或 numpy 内部结构有点无能为力。

编辑: doACalculation 不是无副作用的函数。它接收一些数据(假设这是一些 python 对象)以及它写入和读取一些中间步骤的缓存。我不确定它是否可以轻松矢量化。我尝试按照建议使用 numpy.vectorize，但没有看到比天真的 for 循环有显着的加速。 (我通过状态变量传入了额外的数据):

最佳答案

将其包装在 numba autojit 中应该会大大提高性能。

def doACalculationVector(n, smallerDim):
    return np.ones((smallerDim, n)) + 1


def testVector():
    n = 1000
    smallerDim = 800
    mat =  np.zeros((n, n))
    mat.fill(10) 
    mat[:smallerDim] = abs(1.0 / (doACalculationVector(n, smallerDim) + 1.0))
    return mat

@numba.autojit
def doACalculationNumba(i,j):
    return 2

@numba.autojit
def testNumba():
    n = 1000
    smallerDim = 800
    mat =  np.zeros((n, n))
    mat.fill(10)
    for i in xrange(0, smallerDim):
        for j in xrange(0, n):
            mat[i,j] = abs(1.0 / (doACalculationNumba(i, j) + 1.0))
    return mat

原时间引用:(将mat[i][j]改为mat[i,j])

In [24]: %timeit test()
1 loops, best of 3: 226 ms per loop

现在我稍微简化了函数，因为这就是所提供的全部内容。但 testNumba 在计时时的速度大约是测试的 40 倍。大约速度是矢量化的 3 倍

In [20]: %timeit testVector()
100 loops, best of 3: 17.9 ms per loop

In [21]: %timeit testNumba()
100 loops, best of 3: 5.91 ms per loop

关于python - 在 python/numpy 中优化矩阵写入，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20777877/

python - 在 python/numpy 中优化矩阵写入

上一篇：python - 在python中找到特定字符串后如何打印所有行？

下一篇：Python - 将字符串导入列表，导入另一个列表 :)