python - 加速 python 中的嵌套 for 循环/通过 numpy 数组

假设我有 4 个 numpy 数组 A,B,C,D ，每个数组的大小为 (256,256,1792)。我想遍历这些数组的每个元素并对它做一些事情，但是我需要以 256x256x256 立方体的 block 来做。

我的代码是这样的:

for l in range(7): 
    x, y, z, t = 0,0,0,0
    for m in range(a.shape[0]):
        for n in range(a.shape[1]):
            for o in range(256*l,256*(l+1)):
                t += D[m,n,o] * constant
                x += A[m,n,o] * D[m,n,o] * constant
                y += B[m,n,o] * D[m,n,o] * constant
                z += C[m,n,o] * D[m,n,o] * constant
    final = (x+y+z)/t
    doOutput(final)

该代码可以正常运行并输出我想要的内容，但速度非常慢。我在网上读到，在 python 中应该避免那种嵌套的 for 循环。最干净的解决方案是什么？ (现在我正尝试在 C 中完成我的这部分代码，并通过 Cython 或其他工具以某种方式导入它，但我喜欢纯 python 解决方案)

谢谢

添加

Willem Van Onsem 对第一部分的解决方案似乎工作得很好，我想我理解它。但是现在我想在求和之前修改我的值。看起来像

(在外层 l 循环内)

for m in range(a.shape[0]):
    for n in range(a.shape[1]):
        for o in range(256*l,256*(l+1)):
            R += (D[m,n,o] * constant * (A[m,n,o]**2 
            + B[m,n,o]**2 + C[m,n,o]**2)/t - final**2)
doOutput(R)

我显然不能只对总和求平方 x = (A[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub .sum()**2*constant 因为 (A²+B²) != (A+B)² 如何重做最后一个 for 循环？

最佳答案

自从您更新 t m in range(a.shape[0]) 的每个元素, n in range(a.shape[1])和 o in range(256*l,256*(l+1)) ，你可以替代:

for m in range(a.shape[0]):
    for n in range(a.shape[1]):
        for o in range(256*l,256*(l+1)):
            t += D[m,n,o]

与:

t += D[:a.shape[0],:a.shape[1],256*l:256*(l+1)].sum()

其他作业也一样。因此，您可以将代码重写为:

for l in range(7): 
    Dsub = D[:a.shape[0],:a.shape[1],256*l:256*(l+1)]
    x = (A[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
    y = (B[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
    z = (C[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*Dsub).sum()*constant
    t = Dsub.sum()*constant
   final = (x+y+z)/t
   doOutput(final)

请注意 *在 numpy 中是element-wise 乘法，不是矩阵乘积。您可以在求和之前进行乘法运算，但由于与常数相乘的总和等于该常数与总和的乘积，我认为在循环外执行此操作更有效。

如果a.shape[0]等于D.shape[0]等。您可以使用 :而不是 :a.shape[0] .根据您的问题，似乎是这样。所以:

# only when `a.shape[0] == D.shape[0], a.shape[1] == D.shape[1] (and so for A, B and C)`
for l in range(7): 
    Dsub = D[:,:,256*l:256*(l+1)]
    x = (A[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
    y = (B[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
    z = (C[:,:,256*l:256*(l+1)]*Dsub).sum()*constant
    t = Dsub.sum()*constant
    final = (x+y+z)/t
    doOutput(final)

正在处理 .sum()在 numpy 上level 将提高性能，因为您不使用 .sum() 来回转换值，你使用了一个紧循环。

编辑:

您更新后的问题没有太大变化。您可以简单地使用:

m,n,_* = a.shape
lo,hi = 256*l,256*(l+1)
R = (D[:m,:n,lo:hi]*constant*(A[:m,:n,lo:hi]**2+B[:m,:n,lo:hi]**2+D[:m,:n,lo:hi]**2)/t-final**2)).sum()
doOutput(R)

关于python - 加速 python 中的嵌套 for 循环/通过 numpy 数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43068785/

python - 加速 python 中的嵌套 for 循环/通过 numpy 数组

上一篇：python - 类型错误 : init() takes exactly 2 arguments (3 given) - Odoo v8 to Odoo v10 community

下一篇：python - 遍历数据以从另一个列表中查找值并将其添加到字典中

python - 加速 python 中的嵌套 for 循环/通过 numpy 数组

上一篇：python - 类型错误 : __init__() takes exactly 2 arguments (3 given) - Odoo v8 to Odoo v10 community

下一篇：python - 遍历数据以从另一个列表中查找值并将其添加到字典中

上一篇：python - 类型错误 : init() takes exactly 2 arguments (3 given) - Odoo v8 to Odoo v10 community