python - Cloud Vision API 客户端抛出操作系统错误 "too many open files"

标签 python ubuntu google-cloud-vision urllib3

当我使用 Python 通过 Cloud Vision API 客户端运行标签检测时,我遇到了“打开的文件太多”的错误。
当我在这篇文章之前在 GitHub 上询问这个问题时,维护者给了我一个建议,这个问题是一般的 Python 问题而不是 API。
在这个建议之后,我还没有理解为什么 Python 会抛出“太多打开的文件”。
我进行了日志记录,它显示 urllib3 引发了此类错误,尽管我没有明确导入该包。
我错了什么?请帮助我。
我的环境是

  • Ubuntu 16.04.3 LTS(GNU/Linux 4.4.0-112-generic x86_64)
  • python 3.5.2
  • 谷歌云视觉 (0.31.1)

错误日志:

[2018-05-25 20:18:46,573] {label_detection.py:60} DEBUG - success open decile_data/image/src/00000814.jpg
[2018-05-25 20:18:46,573] {label_detection.py:62} DEBUG - success convert image to types.Image
[2018-05-25 20:18:46,657] {requests.py:117} DEBUG - Making request: POST https://accounts.google.com/o/oauth2/token
[2018-05-25 20:18:46,657] {connectionpool.py:824} DEBUG - Starting new HTTPS connection (1): accounts.google.com
[2018-05-25 20:18:46,775] {connectionpool.py:396} DEBUG - https://accounts.google.com:443 "POST /o/oauth2/token HTTP/1.1" 200 None
[2018-05-25 20:18:47,803] {label_detection.py:60} DEBUG - success open decile_data/image/src/00000815.jpg
[2018-05-25 20:18:47,803] {label_detection.py:62} DEBUG - success convert image to types.Image
[2018-05-25 20:18:47,896] {requests.py:117} DEBUG - Making request: POST https://accounts.google.com/o/oauth2/token
[2018-05-25 20:18:47,896] {connectionpool.py:824} DEBUG - Starting new HTTPS connection (1): accounts.google.com
[2018-05-25 20:18:47,902] {_plugin_wrapping.py:81} ERROR - AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7fcd94eb7dd8>" raised exception!
Traceback (most recent call last):
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/util/ssl_.py", line 313, in ssl_wrap_socket
OSError: [Errno 24] Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 601, in urlopen
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 346, in _make_request
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connection.py", line 326, in connect
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/util/ssl_.py", line 315, in ssl_wrap_socket
urllib3.exceptions.SSLError: [Errno 24] Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/requests/adapters.py", line 440, in send
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 639, in urlopen
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/util/retry.py", line 388, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by SSLError(OSError(24, 'Too many open files'),))

以上错误导出的脚本如下:

# -*- coding: utf-8 -*-
""" Detecting labels of images using Google Cloud Vision. """

import argparse
import csv
from datetime import datetime
import os
import logging
from pathlib import Path
import sys
from google.cloud import vision
from google.cloud.vision import types


logger= logging.getLogger(__name__)


def get_commandline_args():
    parser = argparse.ArgumentParser(
        description='Detecting labels of images using Google Cloud Vision.')

    parser.add_argument('--image-dir',
                        type=str,
                        required=True,
                        help='Directory in which images are saved.')
    parser.add_argument('--output-path',
                        type=str,
                        required=True,
                        help='Path of output file. This is saved as CSV.')
    parser.add_argument('--max-results',
                        type=int,
                        required=False,
                        default=5,
                        help=('Maximum number of resulting labels.'
                              ' Default is 5.'))
    parser.add_argument('--debug',
                        type=bool,
                        required=False,
                        default=False,
                        help=('Whether running to debug.'
                              ' If True, this scripts will run on 3 files.'
                              ' Default is False.'))
    return parser.parse_args()


def load_image(path):
    """ load image to be capable with Google Cloud Vision Clienet API.

    Args:
        path (str): a path of an image.

    Returns:
        img : an object which is google.cloud.vision.types.Image.

    Raise:
        IOError is raised when 'open' is failed to load the image.
    """
    with open(path, 'rb') as f:
        content = f.read()
    logger.debug('success open {}'.format(path))
    img = types.Image(content=content)
    logger.debug('success convert image to types.Image')

    return img


def detect_labels_of_image(path, max_results):
    _path = Path(path)
    client = vision.ImageAnnotatorClient()
    image = load_image(path=str(_path))
    execution_time = datetime.now()
    response = client.label_detection(image=image, max_results=max_results)
    labels = response.label_annotations
    for label in labels:
        record = (str(_path), _path.name, label.description,
                  label.score, execution_time.strftime('%Y-%m-%d %H:%M:%S'))
        yield record


def main():
    args = get_commandline_args()

    file_handler = logging.FileHandler(filename='label_detection.log')
    logging.basicConfig(
        level=logging.DEBUG,
        format='[%(asctime)s] {%(filename)s:%(lineno)s} %(levelname)s - %(message)s',
        handlers=[file_handler]
    )

    image_dir = args.image_dir

    with open(args.output_path, 'w') as fout:

        writer = csv.writer(fout, lineterminator='\n')
        header = ['path', 'filename', 'label', 'score', 'executed_at']
        writer.writerow(header)

        image_file_lists = os.listdir(image_dir)
        image_file_lists.sort()
        if args.debug:
            image_file_lists = image_file_lists[:3]

        for filename in image_file_lists:
            path = os.path.join(image_dir, filename)
            try:
                results = detect_labels_of_image(path, args.max_results)
            except Exception as e:
                logger.warning(e)
                logger.warning('skiped processing {} due to above exception.'.format(path))
            for record in results:
                writer.writerow(record)


if __name__ == '__main__':
    main()

最佳答案

这不是您遇到的谷歌限制。我想,你正在达到最大数量。进程允许的打开文件数。您可以在进程运行时检查所有打开的文件。使用“lsof”之类的东西来查看进程的所有打开文件。我猜你会看到很多 ipv4、ipv6 连接打开。如果是,请继续阅读。

您在这里为每个图像打开客户端,这意味着为每个图像打开一个安全的经过身份验证的连接。使线路客户端全局化。

从该函数中取出“client = vision.ImageAnnotatorClient()”这一行。使客户端全局化。将使用一个打开的连接。这应该可以解决您的问题。

关于python - Cloud Vision API 客户端抛出操作系统错误 "too many open files",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50545515/

相关文章:

python - 如何在Python中找到列表的中位数

linux - 有没有办法从 linux 的系统日志中检索消息的严重级别?

c - 杀死除系统进程和我自己的进程之外的所有进程

Ubuntu 12.04.3 x32 的 nginx VPS 上的 phpmyadmin 错误

azure - Google Cloud Vision 反向图像搜索在 Azure 应用服务上失败,因为找不到 GOOGLE_APPLICATION_CREDENTIALS 文件

python - 如何深入字典并删除最深的键级别

python - Django super 用户无权删除模型

python - 如何使用Python增加乒乓球的摩擦力?

python - 警告 :oauth2client. util:build() 最多接受 2 个位置参数(给定 3 个)

google-cloud-platform - Google Cloud Vision - Google 将图像上传到哪个区域?