Python 多处理——共享 id 的单独进程中的全局变量?

标签 python parallel-processing multiprocessing python-multiprocessing

来自 this question我了解到:

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.



为了验证这种行为,我制作了一个测试脚本:
import time
import multiprocessing as mp
from multiprocessing import Pool
x = [0]  # global
def worker(c):
    if c == 1:  # wait for proc 2 to finish; is global x overwritten by now?
        time.sleep(2)
    print('enter: x =', x, 'with id', id(x), 'in proc', mp.current_process())
    x[0] = c
    print('exit: x =', x, 'with id', id(x), 'in proc', mp.current_process())
    return x[0]

pool = Pool(processes=2)
x_vals = pool.map(worker, [1, 2])
print('parent: x =', x, 'with id', id(x), 'in proc', mp.current_process())
print('final output', x_vals)

输出(在 CPython 上)类似于
enter: x = [0] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-2, started daemon)>
exit: x = [2] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-2, started daemon)>
enter: x = [0] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-1, started daemon)>
exit: x = [1] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-1, started daemon)>
parent: x = [0] with id 140138406834504 in proc <_MainProcess(MainProcess, started)>
final output [1, 2]

我该如何解释 id 的事实?的 x在所有进程中共享,但 x取不同的值?不是 id conceptually the memory address of a Python object ?
我想如果在子进程中克隆内存空间,这是可能的。那么有什么可以用来获取 Python 对象的实际物理内存地址的吗?

最佳答案

共享状态

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.


这里的关键点似乎是:

That global state is not shared..."


...引用那个子进程的全局状态。但这并不意味着父进程的全局状态的一部分不能与子进程共享,只要子进程不尝试写入这部分。发生这种情况时,这部分将被复制和更改,父级将不可见。
背景:
在 Unix 上 'fork'是启动子进程的默认方式:

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.


Fork 是使用 copy-on-write 实现的, 所以除非你给 x 分配一个新对象没有复制发生,子进程与其父进程共享相同的列表。

内存地址

How should I explain the fact that the id of x is shared in all the processes, yet x takes different values?


Fork creates a child process in which the virtual address space is identical to the virtual address space of the parent. The virtual addresses will all map to the same physical addresses until copy-on-write occurs.

Modern OSes use virtual addressing. Basically the address values (pointers) you see inside your program are not actual physical memory locations, but pointers to an index table (virtual addresses) that in turn contains pointers to the actual physical memory locations. Because of this indirection, you can have the same virtual address point to different physical addresses IF the virtual addresses belong to index tables of separate processes. link



Then is there something I can use to get the actual physical memory address of a Python object?


似乎没有办法获得实际的物理内存地址( link )。 id返回 virtual (逻辑)内存地址(CPython)。从虚拟内存地址到物理内存地址的实际转换落在 MMU .

关于Python 多处理——共享 id 的单独进程中的全局变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48491143/

相关文章:

python - 多处理 Python 中的写入错误

python - Flask - 防止多个打开的 mysql 连接?

python - 如何合并字典,从匹配的键中收集值?

python - 如何在 AMD GPU 上运行 Python?

python - multiprocessing.Pool 示例

python - 为什么 multiprocessing.Pool 和 multiprocessing.Process 在 Linux 中的表现如此不同

python - Jupyter Notebook 和以前的输出

python - sklearn : TFIDF Transformer : How to get tf-idf values of given words in document

algorithm - 并行计算中的随机数生成器(MATLAB),每个并行循环中的初始随机数不同?

r - R : %dopar% vs %do%. 中的并行化为什么使用单核可以获得更好的性能?