python - 如何使用多线程来优化人脸检测?

标签 python multithreading python-multithreading

我有一个代码,它使用 CSV 文件中的图像 URL 列表,然后对这些图像执行面部检测,然后加载一些模型并对这些图像进行预测。

我做了一些负载测试,发现代码中的 get_face 函数花费了生成结果所需的大部分时间,而额外的时间则用于为预测创建的 pickle 文件。

问题:是否有可能通过在线程中运行这些进程来减少时间,以及如何以多线程方式实现这一点?

下面是代码示例:

from __future__ import division
import numpy as np

from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize

df = pd.read_csv('/home/instaurls.csv')
detector = dlib.get_frontal_face_detector()
img_width, img_height = 139, 139
confidence = 0.8

def get_face():
    output = None
    data1 = []
    for row in df.itertuples():
        img = io.imread(row[1])
        dets = detector(img, 1)
        for i, d in enumerate(dets):
            img = img[d.top():d.bottom(), d.left():d.right()]
            img = resize(img, (img_width, img_height))
            output = np.expand_dims(img, axis=0)
            break
        data1.append(output)
    data1 = np.concatenate(data1)
    return data1

get_face()

csv 示例

data
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg

最佳答案

以下是您可以尝试并行执行的方法:

from __future__ import division
import numpy as np

from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize
from csv import DictReader

df = DictReader(open('/home/instaurls.csv')) # DictReader is iterable
detector = dlib.get_frontal_face_detector() 
img_width, img_height = 139, 139
confidence = 0.8

def get_face(row):
    """
    Here row is dictionary where keys are CSV header names
    and values are values from current CSV row.
    """
    output = None

    img = io.imread(row[1]) # row[1] has to be changed to row['data']?
    dets = detector(img, 1)
    for i, d in enumerate(dets):
        img = img[d.top():d.bottom(), d.left():d.right()]
        img = resize(img, (img_width, img_height))
        output = np.expand_dims(img, axis=0)
        break

    return output

if __name__ == '__main__':
    pool = Pool() # default to number CPU cores
    data = list(pool.imap(get_face, df))
    print np.concatenate(data)

注意 get_face 及其参数。另外,还有它返回的内容。这就是我所说的小块工作的意思。现在,get_face 处理 CSV 中的一行。

当您运行此脚本时,pool 将是对 Pool 实例的引用,然后您为每一行调用 get_face/df.itertuples() 中的元组。

一切完成后,data 保存处理数据,然后您对其执行 np.concatenate 操作。

关于python - 如何使用多线程来优化人脸检测?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48881833/

相关文章:

python - python 中共享数组的 RawArray 的最大大小

python - 国际化 - Python MySQLDb 和 ISO-8859-7

multithreading - 达到特定核心数后出现严重的多线程内存瓶颈

Java 进程优先于其他 Windows 进程

python - 附加以列出线程的结果

python多线程性能

python - 有条件地填充 pandas groupby 对象中元素的有效方法(可能通过应用函数)

Python - turtle 重建图像太慢

C 程序卡住,不进入 main()

python - 限制一次运行的最大线程数的正确方法?