我正在尝试使用 forkserver
我遇到了NameError: name 'xxx' is not defined
在工作进程中。
我使用的是 Python 3.6.4,但文档应该相同,来自 https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods它说:
The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
此外,它还说:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
显然,我的工作进程需要处理的关键对象没有被服务器进程继承然后传递给工作人员,为什么会发生这种情况?我想知道 forkserver 进程究竟从父进程继承了什么?
这是我的代码的样子:
import multiprocessing
import (a bunch of other modules)
def worker_func(nameList):
global largeObject
for item in nameList:
# get some info from largeObject using item as index
# do some calculation
return [item, info]
if __name__ == '__main__':
result = []
largeObject # This is my large object, it's read-only and no modification will be made to it.
nameList # Here is a list variable that I will need to get info for each item in it from the largeObject
ctx_in_main = multiprocessing.get_context('forkserver')
print('Start parallel, using forking/spawning/?:', ctx_in_main.get_context())
cores = ctx_in_main.cpu_count()
with ctx_in_main.Pool(processes=4) as pool:
for x in pool.imap_unordered(worker_func, nameList):
result.append(x)
谢谢!最好的,
最佳答案
理论
以下是 Bojan Nikolic blog 的摘录
Modern Python versions (on Linux) provide three ways of starting the separate processes:
Fork()-ing the parent processes and continuing with the same processes image in both parent and child. This method is fast, but potentially unreliable when parent state is complex
Spawning the child processes, i.e., fork()-ing and then execv to replace the process image with a new Python process. This method is reliable but slow, as the processes image is reloaded afresh.
The forkserver mechanism, which consists of a separate Python server with that has a relatively simple state and which is fork()-ed when a new processes is needed. This method combines the speed of Fork()-ing with good reliability (because the parent being forked is in a simple state).
Forkserver
The third method, forkserver, is illustrated below. Note that children retain a copy of the forkserver state. This state is intended to be relatively simple, but it is possible to adjust this through the multiprocess API through the
set_forkserver_preload()
method.
实践
因此,如果您希望子进程从父进程继承某些东西,则必须在 中指定。 fork 服务器 通过
set_forkserver_preload(modules_names)
声明,它设置了要尝试在 forkserver 进程中加载的模块名称列表。我在下面举一个例子:# inherited.py
large_obj = {"one": 1, "two": 2, "three": 3}
# main.py
import multiprocessing
import os
from time import sleep
from inherited import large_obj
def worker_func(key: str):
print(os.getpid(), id(large_obj))
sleep(1)
return large_obj[key]
if __name__ == '__main__':
result = []
ctx_in_main = multiprocessing.get_context('forkserver')
ctx_in_main.set_forkserver_preload(['inherited'])
cores = ctx_in_main.cpu_count()
with ctx_in_main.Pool(processes=cores) as pool:
for x in pool.imap(worker_func, ["one", "two", "three"]):
result.append(x)
for res in result:
print(res)
输出:# The PIDs are different but the address is always the same
PID=18603, obj id=139913466185024
PID=18604, obj id=139913466185024
PID=18605, obj id=139913466185024
如果我们不使用预加载...
ctx_in_main = multiprocessing.get_context('forkserver')
# ctx_in_main.set_forkserver_preload(['inherited'])
cores = ctx_in_main.cpu_count()
...
# The PIDs are different, the addresses are different too
# (but sometimes they can coincide)
PID=19046, obj id=140011789067776
PID=19047, obj id=140011789030976
PID=19048, obj id=140011789030912
关于python - python中的多处理-forkserver进程从父进程继承了什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63424251/