python - urllib 打开文件太多

标签 python urllib

我在处理超过 2700 个文件时遇到问题 如果我有一点点文件,比如几百个,那么这是可行的,我猜这与 Windows 限制打开文件有关,比如在 Linux 中 ulimit 可以在系统范围内定义。我确信事情没有被关闭,这就是我收到此错误的原因。

我有一个通过 post 发送文件的函数:

def upload_photos(url_photo, dict, timeout):
    photo = dict['photo']
    data_photo = dict['data']
    name = dict['name']
    conn = requests.post(url_photo, data=data_photo, files=photo, timeout=timeout)
    return {'json': conn.json(), 'name': name}

从目录列表的循环中调用:

for photo_path in [p.lower() for p in photos_path]:
        if ('jpg' in photo_path or 'jpeg' in photo_path) and "thumb" not in photo_path:
            nr_photos_upload +=1
    print("Found " + str(nr_photos_upload) + " pictures to upload")
    local_count = 0
    list_to_upload = []
    for photo_path in [p.lower() for p in photos_path]:
        local_count += 1
        if ('jpg' in photo_path or 'jpeg' in photo_path) and "thumb" not in photo_path and local_count > count:
            total_img = nr_photos_upload
            photo_name = os.path.basename(photo_path)
            try :
                photo = {'photo': (photo_name, open(path + photo_path, 'rb'), 'image/jpeg')}
                try:
                    latitude, longitude, compas = get_gps_lat_long_compass(path + photo_path)
                except ValueError as e:
                    if e != None:
                        try:
                            tags = exifread.process_file(open(path + photo_path, 'rb'))
                            latitude, longitude = get_exif_location(tags)
                            compas = -1
                        except Exception:
                            continue
                if compas == -1:
                    data_photo = {'coordinate'    : str(latitude) + "," + str(longitude),
                               'sequenceId'       : id_sequence,
                               'sequenceIndex'    : count
                               }
                else :
                    data_photo = {'coordinate'    : str(latitude) + "," + str(longitude),
                               'sequenceId'       : id_sequence,
                               'sequenceIndex'    : count,
                               'headers'          : compas
                               }
                info_to_upload = {'data': data_photo, 'photo':photo, 'name': photo_name}
                list_to_upload.append(info_to_upload)
                count += 1
            except Exception as ex:
                print(ex)
    count_uploaded = 0
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Upload feature called from here
        future_to_url = {executor.submit(upload_photos, url_photo, dict, 100): dict for dict in list_to_upload}
        for future in concurrent.futures.as_completed(future_to_url):
            try:
                data = future.result()['json']
                name = future.result()['name']
                print("processing {}".format(name))
                if data['status']['apiCode'] == "600":

                    percentage = float((float(count_uploaded) * 100) / float(total_img))
                    print(("Uploaded - " + str(count_uploaded) + ' of total :' + str(
                        total_img) + ", percentage: " + str(round(percentage, 2)) + "%"))
                elif data['status']['apiCode'] == "610":
                    print("skipping - a requirement arguments is missing for upload")
                elif data['status']['apiCode'] == "611":
                    print("skipping - image does not have GPS location metadata")
                elif data['status']['apiCode'] == "660":
                    print("skipping - duplicate image")
                else :
                    print("skipping - bad image")
                count_uploaded += 1
                with open(path + "count_file.txt", "w") as fis:
                    fis.write((str(count_uploaded)))
            except Exception as exc:
                print('%generated an exception: %s' % (exc))

最佳答案

您可以在C语言中设置_setmaxstdio来更改一次可以打开的文件数量。

对于Python,你必须使用pywin32中的win32file:

import win32file
win32file._setmaxstdio(1024) #set max number of files to 1024

默认值为512。并确保检查您设置的最大值是否受您的平台支持。

引用:https://msdn.microsoft.com/en-us/library/6e3b887c.aspx

关于python - urllib 打开文件太多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38559318/

相关文章:

python - 减少 Pandas 中的一列

python - 使用 pd.read_clipboard 复制数据框时如何处理自定义命名索引?

python - 什么时候 Python 无法在运行前确定对象的类型?

python-3.x - 卡在带有 urllib (python3) 的打开网址上

python - 由于 SSLError,urllib.request.open 无法打开 URL

python - 下载图片时,urllib是否有成功或失败的返回码?

python - gdata 电子表格上的授权错误

python - Python 3 取消引号生成 chr 和 int 别名的原因?

python - BeautifulSoup 没有提取所有 html(自动删除页面的大部分 html)

python - 在 PyCharm 中运行命令行命令