python - 多处理池管理器命名空间 EOF 错误

标签 python pandas multiprocessing

当我使用 pool.manager.namespace 共享一个 Pandas 数据帧,并且每个目标函数都会调用 .sample(5000) 到这个数据帧时,会发生 EOF 错误。

def get_sample(i):
    print("start round {}".format(i))
    sample = sharedData.data.sample(5000, random_state=i)

if __name__=='__main__':
    with mp.Pool(cpu_count(logical=False)) as pool0:
        results = pool0.map(load_data, paths)
        sharedData.data = pd.concat(results, axis=0, copy=False)
        genes = sharedData.data.columns
        pool0.close()
        pool0.join()
        del results

    """sampling"""
    with mp.Pool(cpu_count(logical=True)) as pool:
        print("start sampling, total round = {}".format(1000))
        r = pool.map_async(get_sample, [j for j in range(1000)], error_callback=my_error)
        results2 = r.get()
        pool.close()
        pool.join()

其中有回溯:
start round 145
round35 returns output
round18 returns output
rount161 returns output
start round 704
start round 720
start round 736
start round 752
start round 768
start round 784
start round 800
start round 816
start round 832
start round 848
start round 864
start round 880
start round 896
start round 912
start round 928
start round 944
start round 960
start round 976
start round 992
from error_callback: 

multiprocessing.pool.RemoteTraceback: 
multiprocessing.pool.RemoteTraceback: 
"""

Traceback (most recent call last):
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "sampling2temp.py", line 38, in get_sample_ys
    sample = sharedData.data.sample(5000, random_state=i)
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/managers.py", line 1060, in __getattr__
    return callmethod('__getattribute__', (key,))
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod
    kind, result = conn.recv()
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "sampling2temp.py", line 105, in <module>
    results2 = r.get()
  File "/usr/usc/python/3.6.0/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
EOFError

似乎任务 704 到 992 根本没有返回任何输出,然后管理器进程关闭。所以当其中一个正在运行的任务从 manager.namespace.data 读取数据时,它会收到 EOF。

顺便说一下,如果我将 sample(5000) 更改为 sample(2500) 并将 Manager.Namespace.data 的大小从 2127096024 字节更改为 1738281624 字节,则不存在 EOF 问题。那是因为每个 worker 使用了太多内存吗?

最佳答案

multiprocessing.Connection如果所有关联的发送方连接都已关闭,接收方将抛出 EOFError。

看起来 multiprocessing.Manager 基于堆栈跟踪在后台使用 multiprocessing.Connection 。由于看起来你的代码没有提前终止管理器进程,我认为问题一定是管理器进程是hitting an exception and terminating before you are done with it .由于减少样本量似乎可以解决问题,it's possible the Manager process gets killed off by the OOM killer for using too much memory - 您可以使用该链接文章中建议的命令来检查是否是这种情况:

dmesg | egrep -i "killed process"

你会期望看到这样的事情:
host kernel: Out of Memory: Killed process 1234 (python).

关于python - 多处理池管理器命名空间 EOF 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57370803/

相关文章:

python - Pandas 数据框过滤

python - 按 "kind"计算值并使用该计数更新 DataFrame 中的值的更快方法?

Python多处理大量数据

具有不同功能的python多处理

python:替换从文件读取的数组中的负值

Python Selenium 错误 : "WebDriverException: ' login' executable needs to be in PATH.“

python - "for-in loop"的学习索引

datetime - Pandas 上个月开始

python - 使用 Python 的 multiprocessing.Pool 和 map_asynch,如何获取有关工作人员的信息?

python - python 中的 lapply 等效函数