python多处理与异步共享numpy数组: pool vs queue

我希望在规则网格上生成周期性 Perlin 噪声。我需要生成多张 map ，而且网格非常大，所以我想使用多处理，为每个核心生成一张 map 。

这些 map 将绘制在一个图形上，并一个接一个地放在一个二进制 dat 文件中。
map 将存储在单个 numpy 数组中，大小为 map 数量*
节点数，所以切片将是一张 map ，因此我可以同时访问阵列的不同区域而无需担心。

我作为引用this thread ，它使用一个池和 this one ，我使用队列在多处理中做一些绘图。

我想出了两个代码:一个带有队列的代码在我自己的计算机上工作正常，但在我实验室的工作站或我的专业笔记本电脑上却没有:我没有错误消息，它只是在某个时候卡住。
第二个工作得很好，我发现它比第一个例子简单，因为我只是直接在 numpy 数组中写入。 (我不太明白第一个链接的异步情况需要所有函数和 init。)

我的问题是:为什么我的第一个代码有问题？
我只把我认为相关的代码放在下面。

谢谢你的帮助。

第一次尝试:

def generate_irradiation_maps(rad_v):
    while tasks_queue.empty() == False:
        print("fetching work ...")
        map_index = tasks_queue.get()  # get some work to do from the queue
        print("----> working on map: %s" % map_index)
        perm = range(permsize)
        random.shuffle(perm)
        perm += perm
        for i in range(nb_nodes):
            # call the perlin function: fBm
            rad_v[map_index, i] = fBm(perm, x[i] * freq, y[i] * freq, int(sizex *     freq), int(sizey * freq), octs, persistance)
        rad_v[map_index, :] = rad_v[map_index, :] + abs(min(rad_v[map_index, :]))
        rad_v[map_index, :] = rad_v[map_index, :] / max(rad_v[map_index, :])
        figure = plt.figure(figsize=(20, 7))
        plt.tricontourf(x, y, rad_v[map_index, :])
        plt.axis('image')
        plt.colorbar(shrink=.5)
        figure.savefig('diff_gb_and_pf_irrad_c_map_' + str(map_index) + '.png')
        plt.clf()
        plt.close()
        tasks_queue.task_done()  # work for this item finished

start_time = time.time()
nb_maps = 10
nb_proc = 1  # number of processes

print("generating %d irradiation maps" % nb_maps)
irrad_c_base_array = mp.Array(ctypes.c_double, nb_maps * nb_nodes)  
irrad_c = np.frombuffer(irrad_c_base_array.get_obj())
irrad_c = irrad_c.reshape(nb_maps, nb_nodes)

tasks_queue = mp.JoinableQueue()  # a queue to pile up the work to do

jobs = list(range(nb_maps))  # each job is composed of a map
print("inserting jobs in the queue...")
for job in jobs:
    tasks_queue.put(job)
print("done")

# launch the processes
for i in range(nb_proc):
    current_process = mp.Process(target=generate_irradiation_maps, args=(irrad_c,     ))
    current_process.start()

# wait for all tasks to be treated
tasks_queue.join()

第二次尝试:

def generate_irradiation_maps(arg_list):
    map_index = arg_list[0]
    print('working on map %i ' % map_index)
    perm = range(permsize)
    random.shuffle(perm)
    perm += perm
    for i in range(nb_nodes):
        arg_list[1][i] = fBm(perm, x[i] * freq, y[i] * freq, int(sizex * freq),     int(sizey * freq), octs, persistance)
    arg_list[1][:] = arg_list[1][:] + abs(min(arg_list[1][:]))
    arg_list[1][:] = arg_list[1][:] / max(arg_list[1][:])
# plot
figure = plt.figure(figsize=(20, 7))
#plt.tricontourf(x, y, rad_v[map_index, :])
plt.tricontourf(x, y, arg_list[1][:])
plt.axis('image')
plt.colorbar(shrink=.5)
figure.savefig('diff_gb_and_pf_irrad_c_map_' + str(map_index) + '.png')
plt.clf()
plt.close()


start_time = time.time()
nb_maps = 2
nb_proc = 2  # number of processes

print("generating %d irradiation maps" % nb_maps)
irrad_c_base_array = mp.Array(ctypes.c_double, nb_maps * nb_nodes)  # we build     shared array, accessible from all process. we don't access the same zones.
irrad_c = np.frombuffer(irrad_c_base_array.get_obj())
irrad_c = irrad_c.reshape(nb_maps, nb_nodes)

args = [[i,irrad_c[i,:]] for i in range(nb_maps)]

with closing(mp.Pool(processes=nb_proc)) as jobs_pool:
    jobs_pool.map_async(generate_irradiation_maps,args)
jobs_pool.join()

最佳答案

我个人在多处理方面有很多麻烦。 This blog post暗示一种可能性。如果您在 POSIX 和 Windows 操作系统(即来自 Linux、Unix 或 Mac)之间切换，则分出子进程的行为是不同的。博文的结尾建议添加以下代码行以帮助防止您的进程死锁。

from multiprocessing import set_start_method
set_start_method("spawn")

不幸的是，您共享的代码不是自包含的，因此我无法对其进行测试。如果您可能在不同的操作系统上执行代码，请尝试一下，看看是否有帮助!

关于python多处理与异步共享numpy数组: pool vs queue，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18593311/

python多处理与异步共享numpy数组: pool vs queue

上一篇：sql - hql 查询以检索具有最大日期值的数据

下一篇：php - 只有一半的页面加载