python - joblib 中的 batch_size 和 pre_dispatch 到底是什么意思

标签 python multithreading python-3.x multiprocessing joblib

来自此处的文档 https://pythonhosted.org/joblib/parallel.html#parallel-reference-documentation 我不清楚 batch_sizepre_dispatch 到底是什么意思。

让我们考虑使用 'multiprocessing' 后端、2 个作业(2 个进程)并且我们有 10 个任务要计算的情况。

据我了解:

batch_size - 一次控制 pickle 任务的数量,所以如果你设置 batch_size = 5 - joblib 将 pickle 并立即向每个进程发送 5 个任务,然后到达那里,他们将按顺序一个接一个地解决。使用 batch_size=1 joblib 将一次选择并发送一个任务,当且仅当该进程完成了上一个任务。

为了说明我的意思:

def solve_one_task(task):
    # Solves one task at a time
    ....
    return result

def solve_list(list_of_tasks):
    # Solves batch of tasks sequentially
    return [solve_one_task(task) for task in list_of_tasks]

所以这段代码:

Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=5)(
        delayed(solve_one_task)(task) for task in tasks)

等于此代码(在性能上):

slices = [(0,5)(5,10)]
Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=1)(
        delayed(solve_list)(tasks[slice[0]:slice[1]]) for slice in slices)

我说的对吗?那么 pre_dispatch 意味着什么?

最佳答案

事实证明,我是对的,两段代码在性能上非常相似,所以 batch_size 的工作方式与我在问题中预期的一样。 pre_dispatch(如文档所述)控制任务队列中实例化任务的数量。

from sklearn.externals.joblib import Parallel, delayed
from time import sleep, time

def solve_one_task(task):
    # Solves one task at a time
    print("%d. Task #%d is being solved"%(time(), task))
    sleep(5)
    return task

def task_gen(max_task):
    current_task = 0
    while current_task < max_task:
        print("%d. Task #%d was dispatched"%(time(), current_task))
        yield current_task
        current_task += 1

Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=1, pre_dispatch=3)(
        delayed(solve_one_task)(task) for task in task_gen(10))

输出:

1450105367. Task #0 was dispatched
1450105367. Task #1 was dispatched
1450105367. Task #2 was dispatched
1450105367. Task #0 is being solved
1450105367. Task #1 is being solved
1450105372. Task #2 is being solved
1450105372. Task #3 was dispatched
1450105372. Task #4 was dispatched
1450105372. Task #3 is being solved
1450105377. Task #4 is being solved
1450105377. Task #5 was dispatched
1450105377. Task #5 is being solved
1450105377. Task #6 was dispatched
1450105382. Task #7 was dispatched
1450105382. Task #6 is being solved
1450105382. Task #7 is being solved
1450105382. Task #8 was dispatched
1450105387. Task #9 was dispatched
1450105387. Task #8 is being solved
1450105387. Task #9 is being solved
Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

关于python - joblib 中的 batch_size 和 pre_dispatch 到底是什么意思,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33714678/

相关文章:

python : Process calling GRPC server gets stuck and terminates unexpectedly

Python re.findall() 不会终止

python - Cython:初始化结构化 Numpy 数组 ValueError

multithreading - 我想使用Clip来用Java播放wav文件,但是三个Java文件都没有工作

python - 在 Turtle 中不设置动画 - Python 3.4?

python-3.x - Gym 错误 : Cannot re-register id 的自定义环境

python - 如何从 python 执行用 C 编写的文件,同时传递字符串值并接受/存储它作为输出返回的字符串。(Linux)

python - 如何像在 C# 中一样同步运行 Python 协程,直到第一次等待?

c - C 中使用多个分离线程的内存泄漏

c# - 在 C# 中执行并发任务