python - 并行代码在 Python 中的运行速度比在 Matlab 中慢得多

我有一段代码执行以下操作:

for each file (already read in the RAM):
    call a function and obtain a result
add the results up and disply

每个文件都可以并行分析。分析每个文件的函数如下:

# Complexity = 1000*19*19 units of work
def fun(args):
    (a, b, p) = args
    for itr in range(1000):
        for i in range(19):
            for j in range(19):
                # The following random number generated depends on
                # latest values in (i-1, j), (i+1, j), (i, j-1) & (i, j+1)
                # cells of latest a and b arrays
                u = np.random.rand();
                if (u < p):
                    a[i, j] += -1
                else:
                    b[i, j] += 1
    return a+b

我正在使用 multiprocessing 包来实现并行:

import numpy as np
import time
from multiprocessing import Pool, cpu_count

if __name__ == '__main__':
    t = time.time()
    pool = Pool(processes=cpu_count())
    args = [None]*100
    for i in range(100):
        a = np.random.randint(2, size=(19, 19))
        b = np.random.randint(2, size=(19, 19))
        p = np.random.rand()
        args[i] = (a, b, p)
    result = pool.map(fun, args)
    for i in range(2, 100):
        result[0] += result[i]
    print result[0]
    print time.time() - t

我编写了等效的 MATLAB 代码，它使用 parfor 并在 parfor 的每次迭代中调用 fun:

tic
args = cell(100, 1);
r = cell(100, 1);
parfor i = 1:100
   a = randi(2, 19, 19);
   b = randi(2, 19, 19);
   p = rand();
   args{i}.a = a;
   args{i}.b = b;
   args{i}.p = p;
   r{i} = fun(args{i});
end

for i = 2:100
    r{1} = r{1} + r{i};
end
disp(r{1});
toc

fun的实现如下:

function [ ret ] = fun( args )
a = args.a;
b = args.b;
p = args.p;

for itr = 1:1000
    for i = 1:19
        for j = 1:19
            u = rand();
            if (u < p)
                a(i, j) = a(i, j) + -1;
            else
                b(i, j) = b(i, j) + 1;
            end
        end
    end
end
ret = a + b;
end

我发现 MATLAB 非常快，在双核处理器上大约需要 1.5 秒，而 Python 程序大约需要 33-34 秒。为什么会这样？

编辑:很多答案建议我应该向量化随机数生成。实际上它的工作方式是，生成的随机数取决于最新的 a 和 b 二维数组。我只是放置了一个简单的 rand() 调用来保持程序的简单性和可读性。在我的程序中，随机数总是通过查看 (i, j) 单元格的某些水平和垂直相邻单元格生成的。所以不可能对其进行矢量化。

最佳答案

您是否在非并行上下文中对 fun 的两种实现进行了基准测试？一个可能只是快了很多。特别是，Python fun 中的那些嵌套循环看起来可能会变成 Matlab 中更快的矢量化解决方案，或者可能会被 Matlab 的 JIT 优化。

将这两个实现都放在分析器中以查看他们将时间花在哪里。将两个实现都转换为非并行并首先分析它们，以确保在引入并行化内容的复杂性之前它们在性能上是等效的。

还有最后一项检查 - 您正在使用本地工作池设置 Matlab 的并行计算工具箱，对吧，而不是连接到远程机器或获取其他资源？ Matlab 端有多少 worker ？

关于python - 并行代码在 Python 中的运行速度比在 Matlab 中慢得多，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16157513/

python - 并行代码在 Python 中的运行速度比在 Matlab 中慢得多

上一篇：python - 使用 QCompleter 全局输入？

下一篇：python - 构造函数参数重新用于新对象？