Python3 无法使用多处理对列表中的 _thread.RLock 对象进行腌制

我正在尝试解析包含汽车属性(154 种属性)的网站。我有一个巨大的列表(名称是 liste_test)，其中包含 280.000 个二手车公告 URL。

def araba_cekici(liste_test,headers,engine):
    for link in liste_test:
        try:
            page = requests.get(link, headers=headers)
        .....
        .....

当我这样开始我的代码时:

araba_cekici(liste_test,headers,engine)

它有效并取得了成果。但是大约1个小时，我只能获取1500个URL的属性。它非常慢，我必须使用多处理。

我在 here 上找到了结果与多处理。然后我应用到我的代码，但不幸的是，它不起作用。

import numpy as np
import multiprocessing as multi

def chunks(n, page_list):
    """Splits the list into n chunks"""
    return np.array_split(page_list,n)

cpus = multi.cpu_count()

workers = []   
page_bins = chunks(cpus, liste_test)


for cpu in range(cpus):
    sys.stdout.write("CPU " + str(cpu) + "\n")
    # Process that will send corresponding list of pages 
    # to the function perform_extraction
    worker = multi.Process(name=str(cpu), 
                           target=araba_cekici, 
                           args=(page_bins[cpu],headers,engine))
    worker.start()
    workers.append(worker)

for worker in workers:
    worker.join()

它给出:

TypeError: can't pickle _thread.RLock objects

我发现了一些关于这个错误的回应。但是它们都不起作用(至少我不能应用于我的代码)。另外，我尝试了 python 多进程 Pool但不幸的是，它停留在 jupyter notebook 上并且似乎这段代码可以无限工作。

最佳答案

迟到的答案，但由于在 Google 上搜索时出现了这个问题:multiprocessing 通过 multiprocessing.Queue 将数据发送到工作进程，这需要所有数据/发送对象 picklable .

在您的代码中，您尝试传递 header 和 engine，您没有显示它们的实现。 (由于 header 包含 HTTP 请求 header ，我怀疑 engine 是这里的问题。)要解决您的问题，您要么必须制作 engine picklable，或者只在工作进程中实例化 engine。

关于Python3 无法使用多处理对列表中的 _thread.RLock 对象进行腌制，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50391854/

Python3 无法使用多处理对列表中的 _thread.RLock 对象进行腌制

上一篇：python - 如何通过传递用户名和密码登录共享点？

下一篇：python - VSCode 调试器自动附加到子进程