我有一个用于构建数据集的搜索查询列表:
类 = [...]
。此列表中有 100 个搜索查询。
基本上,我将列表分为 4 个 block ,每 block 25 个查询。
def divide_chunks(l, n):
for i in range(0, len(l), n):
yield classes[i:i + n]
classes = list(divide_chunks(classes, 25))
下面,我创建了一个函数,可以迭代地从每个 block 下载查询:
def download_chunk(n):
for label in classes[n]:
try:
downloader.download(label, limit=1000, output_dir='dataset', adult_filter_off=True, force_replace=False,verbose=True)
except:
pass
但是,我想同时运行每 4 个 block 。换句话说,我想同时运行 4 个单独的迭代操作。我采用了Threading
和Multiprocessing
方法,但它们都不起作用:
process_1 = Process(target=download_chunk(0))
process_1.start()
process_2 = Process(target=download_chunk(1))
process_2.start()
process_3 = Process(target=download_chunk(2))
process_3.start()
process_4 = Process(target=download_chunk(3))
process_4.start()
process_1.join()
process_2.join()
process_3.join()
process_4.join()
###########################################################
thread_1 = threading.Thread(target=download_chunk(0)).start()
thread_2 = threading.Thread(target=download_chunk(1)).start()
thread_3 = threading.Thread(target=download_chunk(2)).start()
thread_4 = threading.Thread(target=download_chunk(3)).start()
最佳答案
您正在线程/进程之外运行download_chunk
。您需要单独提供函数和参数才能延迟执行:
例如:
Process(target=download_chunk, args=(0,))
Refer to the multiprocessing docs for more information about using the multiprocessing.Process
class .
对于这个用例,我建议使用 multiprocessing.Pool
:
from multiprocessing import Pool
if __name__ == '__main__':
with Pool(4) as pool:
pool.map(download_chunk, range(4))
它处理创建、启动以及随后加入 4 个进程的工作。每个进程都使用可迭代中提供的每个参数调用 download_chunk
,在本例中为 range(4)
。
More info about multiprocessing.Pool
can be found in the docs .
关于python - 在Python中使用线程/多处理并发下载图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69307093/