python - 为什么子进程在 Windows 上启动时导入主模块,而在 Linux 上则不然?

标签 python multiprocessing porting

示例:以下代码在 Ubuntu 14.04 上运行良好

# some imports
import numpy as np
import glob
import sys
import multiprocessing
import os

# creating some temporary data
tmp_dir = os.path.join('tmp', 'nptest')
if not os.path.exists(tmp_dir):
    os.makedirs(tmp_dir)
    for i in range(10):
        x = np.random.rand(100, 50)
        y = np.random.rand(200, 20)
        file_path = os.path.join(tmp_dir, '%05d.npz' % i)
        np.savez_compressed(file_path, x=x, y=y)

def read_npz(path):
    data = dict(np.load(path))
    return (data['x'], data['y'])

def parallel_read(files):
    pool = multiprocessing.Pool(processes=4)
    data_list = pool.map(read_npz, files)
    return data_list

files = glob.glob(os.path.join(tmp_dir, '*.npz'))
x = parallel_read(files)
print('done')

但在 Windows 7 上失败,并显示如下错误消息:

    cmd = get_command_line() + [rhandle]
    pool = multiprocessing.Pool(processes=4)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
  File "C:\Anaconda\lib\multiprocessing\__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.
    self._repopulate_pool()
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
    cmd = get_command_line() + [rhandle]
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.

根据我的理解,这是因为子进程在 Windows 上启动时导入主模块,而在 Linux 上则不然。通过将 x = parallel_read(files) 放在主函数中可以防止 Windows 上的问题。例如:

if __name__ == '__main__':    
    x = parallel_read(files)
    print('done')

为什么子进程在 Windows 上启动时导入主模块,而在 Linux 上则不然?

最佳答案

Windows 没有 fork 函数。大多数其他操作系统都是这样做的,并且在这些平台上,多处理使用它来启动与父进程具有相同状态的新进程。 Windows 必须通过其他方式设置子进程的状态,包括导入 __main__ 模块。

请注意,如果您需要,Python 3.4(及更高版本)允许您在所有操作系统上使用非 fork 实现。请参阅issue 8713在错误跟踪器上讨论此功能。

关于python - 为什么子进程在 Windows 上启动时导入主模块,而在 Linux 上则不然?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35422153/

相关文章:

python - Python 中 Eratosthenes 的惰性筛选

python - 即使我使用相同的层模块构建完全相同的模型,Tensorflow 和 Keras 显示的结果略有不同

python - 以编程方式分配 Locust 任务集

Python使用多处理将图像读取到numpy数组

Python 多处理从不加入

php - 严格标准 : Only variables should be assigned by reference PHP 5. 4

c++ - 如何调试 Cygwin 故障?

python - PySpark 马尔可夫模型的算法/编码帮助

python - 调用外部模块时多处理池变慢

windows - Linux 和 Windows 之间的移植如何工作?