python - python中的多处理-forkserver进程从父进程继承了什么?

标签 python multiprocessing global multiprocess

我正在尝试使用 forkserver我遇到了NameError: name 'xxx' is not defined在工作进程中。
我使用的是 Python 3.6.4,但文档应该相同,来自 https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods它说:

The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.


此外,它还说:

Better to inherit than pickle/unpickle

When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.


显然,我的工作进程需要处理的关键对象没有被服务器进程继承然后传递给工作人员,为什么会发生这种情况?我想知道 forkserver 进程究竟从父进程继承了什么?
这是我的代码的样子:
import multiprocessing
import (a bunch of other modules)

def worker_func(nameList):
    global largeObject
    for item in nameList:
        # get some info from largeObject using item as index
        # do some calculation
        return [item, info]

if __name__ == '__main__':
    result = []
    largeObject # This is my large object, it's read-only and no modification will be made to it.
    nameList # Here is a list variable that I will need to get info for each item in it from the largeObject    
    ctx_in_main = multiprocessing.get_context('forkserver')
    print('Start parallel, using forking/spawning/?:', ctx_in_main.get_context())
    cores = ctx_in_main.cpu_count()
    with ctx_in_main.Pool(processes=4) as pool:
        for x in pool.imap_unordered(worker_func, nameList):
            result.append(x)
谢谢!
最好的,

最佳答案

理论
以下是 Bojan Nikolic blog 的摘录

Modern Python versions (on Linux) provide three ways of starting the separate processes:

  1. Fork()-ing the parent processes and continuing with the same processes image in both parent and child. This method is fast, but potentially unreliable when parent state is complex

  2. Spawning the child processes, i.e., fork()-ing and then execv to replace the process image with a new Python process. This method is reliable but slow, as the processes image is reloaded afresh.

  3. The forkserver mechanism, which consists of a separate Python server with that has a relatively simple state and which is fork()-ed when a new processes is needed. This method combines the speed of Fork()-ing with good reliability (because the parent being forked is in a simple state).


Forkserver

The third method, forkserver, is illustrated below. Note that children retain a copy of the forkserver state. This state is intended to be relatively simple, but it is possible to adjust this through the multiprocess API through the set_forkserver_preload() method. enter image description here


实践
因此,如果您希望子进程从父进程继承某些东西,则必须在 中指定。 fork 服务器 通过 set_forkserver_preload(modules_names) 声明,它设置了要尝试在 forkserver 进程中加载​​的模块名称列表。我在下面举一个例子:
# inherited.py
large_obj = {"one": 1, "two": 2, "three": 3}
# main.py
import multiprocessing
import os
from time import sleep

from inherited import large_obj


def worker_func(key: str):
    print(os.getpid(), id(large_obj))
    sleep(1)
    return large_obj[key]


if __name__ == '__main__':
    result = []
    ctx_in_main = multiprocessing.get_context('forkserver')
    ctx_in_main.set_forkserver_preload(['inherited'])
    cores = ctx_in_main.cpu_count()
    with ctx_in_main.Pool(processes=cores) as pool:
        for x in pool.imap(worker_func, ["one", "two", "three"]):
            result.append(x)
    for res in result:
        print(res)
输出:
# The PIDs are different but the address is always the same
PID=18603, obj id=139913466185024
PID=18604, obj id=139913466185024
PID=18605, obj id=139913466185024
如果我们不使用预加载
...
    ctx_in_main = multiprocessing.get_context('forkserver')
    # ctx_in_main.set_forkserver_preload(['inherited']) 
    cores = ctx_in_main.cpu_count()
...
# The PIDs are different, the addresses are different too
# (but sometimes they can coincide)
PID=19046, obj id=140011789067776
PID=19047, obj id=140011789030976
PID=19048, obj id=140011789030912

关于python - python中的多处理-forkserver进程从父进程继承了什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63424251/

相关文章:

python - 警告 : Expected type [Class Name], 改为 'Dict[str, int]'

Python 3.5 全局变量不会追加

python - PySide QPropertyAnimation 未启动

python - 使用 numpy 进行多处理使 Python 在 OSX 上意外退出

python - 如何在多处理函数中传递参数以及如何使用多处理列表?

python - 如何在 multiprocessing.Process 中传递未 pickle 的对象作为参数?

javascript - javascript中作用域和命名空间有什么区别

python - 从线程更新全局变量并从python中的main访问

Python Pandas 索引位置的值

python - 运行时错误 : Attempting to deserialize object on a CUDA device