python - 在Python中使用线程/多处理并发下载图像

我有一个用于构建数据集的搜索查询列表:

类 = [...]。此列表中有 100 个搜索查询。

基本上，我将列表分为 4 个 block ，每 block 25 个查询。

def divide_chunks(l, n):
    for i in range(0, len(l), n):
        yield classes[i:i + n]

classes = list(divide_chunks(classes, 25))

下面，我创建了一个函数，可以迭代地从每个 block 下载查询:

def download_chunk(n):
    for label in classes[n]:
        try:
            downloader.download(label, limit=1000, output_dir='dataset', adult_filter_off=True, force_replace=False,verbose=True)
        except:
            pass

但是，我想同时运行每 4 个 block 。换句话说，我想同时运行 4 个单独的迭代操作。我采用了Threading和Multiprocessing方法，但它们都不起作用:

process_1 = Process(target=download_chunk(0))
process_1.start()
process_2 = Process(target=download_chunk(1))
process_2.start()
process_3 = Process(target=download_chunk(2))
process_3.start()
process_4 = Process(target=download_chunk(3))
process_4.start()

process_1.join()
process_2.join()
process_3.join()
process_4.join()

###########################################################

thread_1 = threading.Thread(target=download_chunk(0)).start()
thread_2 = threading.Thread(target=download_chunk(1)).start()
thread_3 = threading.Thread(target=download_chunk(2)).start()
thread_4 = threading.Thread(target=download_chunk(3)).start()

最佳答案

您正在线程/进程之外运行download_chunk。您需要单独提供函数和参数才能延迟执行:

例如:

Process(target=download_chunk, args=(0,))

Refer to the multiprocessing docs for more information about using the multiprocessing.Process class .

对于这个用例，我建议使用 multiprocessing.Pool :

from multiprocessing import Pool

if __name__ == '__main__':
    with Pool(4) as pool:
        pool.map(download_chunk, range(4))

它处理创建、启动以及随后加入 4 个进程的工作。每个进程都使用可迭代中提供的每个参数调用 download_chunk，在本例中为 range(4)。

More info about multiprocessing.Pool can be found in the docs .

关于python - 在Python中使用线程/多处理并发下载图像，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69307093/

python - 在Python中使用线程/多处理并发下载图像

上一篇：c - 是否有正则表达式可以修剪小数点后两位数字？

下一篇：sublimetext3 - 在 Sublime Text 4 中选择或突出显示单词类型或颜色的所有实例