python - 多处理 Python 库中的慢速数据结构

我正在尝试使用多处理库对大型数据结构进行简单计算。这是我论文需要的东西，所以请不要对我严厉。

当我决定为多个“worker”、“threads”、“processes”划分我的计算或根据需要调用它时，我开始研究 python 文档以找到我需要的东西，我找到了两个模块，“threading”和“多处理”。阅读后，我决定使用“多处理”，因为它看起来像我需要的东西。

问题是，如果有多个工作人员(进程)，我的计算速度会慢得多。首先想到的是我输入数据的大小。我知道对于小数据，运行线程的“成本”比简单的计算要大得多，但对于更大的结构，效率应该提高。

对于迭代算法，我的计算(例如 2D Rosenbrock)比使用少数进程进行的计算快几倍，这让我感到非常惊讶。计算是针对 100k 元组完成的。

我还注意到 multiprocessing.Queue 访问比访问 collections.deque 慢几倍，但我真的需要这种计算是某种“共享内存”或类似的东西。

谁能告诉我问题出在哪里？难道 Python 如此高效以至于不值得用多进程计算它吗？我是否使用了正确的数据结构？我可以改变我对多处理的看法吗？或者甚至我用不好的方式衡量它？我非常感谢任何可以加快速度的线索。

完整代码如下

#!/usr/bin/python
import multiprocessing
from timeit import default_timer as timer
import random
import collections

class Worker(multiprocessing.Process):
    counter = 0
    def __init__(self, idx, from_queue):
        super(Worker, self).__init__()
        self.from_queue = from_queue
        self.idx = idx

    def run(self):
        print ("Worker started", self.idx)
        for data in iter(self.from_queue.pop, None):
            x_1, x_2 = data
            result = 100*(x_2-x_1**2)**2 + (1-x_1)**2

def main():
    tuple_counts = 100000
    min_x = -5
    max_x = 5

    tuples = multiprocessing.Queue()
    for _ in range(tuple_counts):
        my_tuple = {random.uniform(min_x, max_x), random.uniform(min_x, max_x)}
    tuples.put(my_tuple)

    cores = multiprocessing.cpu_count() - 1

    pops = []
    for _ in range(cores):
        pop = collections.deque()
        pops.append(pop)

    for pop in pops:
        for _ in range(int(tuple_counts/cores)):
            pop.append(tuples.get())


    for _ in range(int(tuple_counts % cores)):
        pops[_].append(tuples.get())

    for pop in pops:
        pop.append(None)

    workers = []
    process_time = 0
    process_time_start = timer()
    for i in range(multiprocessing.cpu_count()-1):
        worker = Worker(i, pops[i])
        workers.append(worker)
        worker.start()
    for worker in workers:
        worker.join()
    process_time_stop = timer()
    process_time += (process_time_stop-process_time_start)
    print("process_time", process_time)

    iter_time = 0
    iter_timer_start = timer()
    for _ in range(tuples.qsize()):
        x_1, x_2 = tuples.qet()
        result = 100*(x_2-x_1**2)**2 + (1-x_1)**2
    iter_timer_stop = timer()
    iter_time += (iter_timer_stop-iter_timer_start)
    print("iter_time", iter_time)

if __name__ == "__main__":
    main()

最佳答案

您正在跨进程边界传递参数以进行简单计算。我预计它会非常慢。

如果您需要速度，我建议您退回到单线程实现，并找到一种使用 numpy 对其进行矢量化的方法。使用 cProfile 对其进行分析。攻击热点。

numpy 的一个巨大好处是减少了 python 开销(名称解析、循环等)。

一旦您快速掌握了单线程方法，然后才转向并行处理。

向量化问题的另一个好处是 numpy 解锁了 GIL 以进行冗长的调用，从而允许真正的线程处理，而不是多处理。

关于python - 多处理 Python 库中的慢速数据结构，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47295824/

python - 多处理 Python 库中的慢速数据结构

上一篇：python - 使用其他列的所有数据绘制列中出现次数最多的 (n) 个值

下一篇：python - PIL ImageGrab 在 Python 中捕获选定的窗口