我想在循环中启动多个进程,但由于它们需要很长时间才能完成,我认为并行运行它们可能会更好。所有这些过程都是独立的,即它们不依赖于彼此的结果。 这是一个小例子,说明了我正在处理的循环类型:
inDir = '/path/to/your/dir/'
inTxtList = ['a.txt','b.txt','c.txt','d.txt','e.txt']
for i in inTxtList:
myfile = open(i,'w')
myfile.write("This is a text file written in python\n")
myfile.close()
我尝试了 multiprocessing
包并得出以下代码:
import multiprocessing
def worker(num):
"""thread worker function"""
myfile = open(num,'w')
myfile.write("This is my first text file written in python\n")
myfile.close()
return
if __name__ == '__main__':
jobs = []
for i in inTxtList:
p = multiprocessing.Process(target=worker, args=(inDir+i,))
jobs.append(p)
p.start()
p.join()
它确实有效,但我不知道如何设置 worker 数量。你能帮我吗?
最佳答案
使用multiprocessing.Pool.map
。您可以在创建 Pool
对象时通过指定 processes
参数来指定工作线程数量:
import os
import multiprocessing
def worker(num):
with open(num, 'w') as f:
f.write("This is my first text file written in python\n")
if __name__ == '__main__':
number_of_workers = 4
pool = multiprocessing.Pool(processes=number_of_workers)
pool.map(worker, [os.path.join(inDir, i) for i in inTxtList])
pool.close()
pool.join()
顺便说一句,使用 os.path.join
而不是手动连接路径组件。
关于python - 并行独立进程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20978577/