python - 使用 urllib3 进行多部分表单编码和发布

标签 python urllib2 urllib3

我正在尝试将 csv 文件上传到 this site .然而,我遇到了一些问题,我认为它源于不正确的 mimetype(可能)。

我正在尝试通过 urllib2 手动发布文件,因此我的代码如下所示:

import urllib
import urllib2
import mimetools, mimetypes
import os, stat
from cStringIO import StringIO

#============================
# Note: I found this recipe online. I can't remember where exactly though.. 
#=============================

class Callable:
    def __init__(self, anycallable):
        self.__call__ = anycallable

# Controls how sequences are uncoded. If true, elements may be given multiple values by
#  assigning a sequence.
doseq = 1

class MultipartPostHandler(urllib2.BaseHandler):
    handler_order = urllib2.HTTPHandler.handler_order - 10 # needs to run first

    def http_request(self, request):
        data = request.get_data()
        if data is not None and type(data) != str:
            v_files = []
            v_vars = []
            try:
                 for(key, value) in data.items():
                     if type(value) == file:
                         v_files.append((key, value))
                     else:
                         v_vars.append((key, value))
            except TypeError:
                systype, value, traceback = sys.exc_info()
                raise TypeError, "not a valid non-string sequence or mapping object", traceback

            if len(v_files) == 0:
                data = urllib.urlencode(v_vars, doseq)
            else:
                boundary, data = self.multipart_encode(v_vars, v_files)

                contenttype = 'multipart/form-data; boundary=%s' % boundary
                if(request.has_header('Content-Type')
                   and request.get_header('Content-Type').find('multipart/form-data') != 0):
                    print "Replacing %s with %s" % (request.get_header('content-type'), 'multipart/form-data')
                request.add_unredirected_header('Content-Type', contenttype)

            request.add_data(data)

        return request

    def multipart_encode(vars, files, boundary = None, buf = None):
        if boundary is None:
            boundary = mimetools.choose_boundary()
        if buf is None:
            buf = StringIO()
        for(key, value) in vars:
            buf.write('--%s\r\n' % boundary)
            buf.write('Content-Disposition: form-data; name="%s"' % key)
            buf.write('\r\n\r\n' + value + '\r\n')
        for(key, fd) in files:
            file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
            filename = fd.name.split('/')[-1]
            contenttype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'
            buf.write('--%s\r\n' % boundary)
            buf.write('Content-Disposition: form-data; name="%s"; filename="%s"\r\n' % (key, filename))
            buf.write('Content-Type: %s\r\n' % contenttype)
            # buffer += 'Content-Length: %s\r\n' % file_size
            fd.seek(0)
            buf.write('\r\n' + fd.read() + '\r\n')
        buf.write('--' + boundary + '--\r\n\r\n')
        buf = buf.getvalue()
        return boundary, buf
    multipart_encode = Callable(multipart_encode)

    https_request = http_request

    import cookielib
    cookies = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),
            MultipartPostHandler)

    opener.addheaders = [(
            'User-agent', 
            'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'
        )]


    params = {"FILENAME" : open("weather_scrape.csv", 'rb'),
            'CGIREF' : '/calludt.cgi/DDFILE1',
            'USE':'MODEL',
            'MODEL':'CM',
            'CROP':'APPLES',
            'METHOD': 'SS',
            'UNITS' : 'E',
            'LOWTHRESHOLD': '50',
            'UPTHRESHOLD': '88',
            'CUTOFF':'H',
            'COUNTY':'AL',
            'ACTIVE':'Y',
            'FROMMONTH':'3',
            'FROMDAY':'15',
            'FROMYEAR': '2013',
            'THRUMONTH':'5',
            'THRUDAY':'13',
            'THRUYEAR':'2013',
            'DATASOURCE' : 'FILE'
            }

    response = opener.open("http://www.ipm.ucdavis.edu/WEATHER/textupload.cgi", params)

现在,当我发布这个时,一切似乎都很好,直到我在第一个 POST 返回的后续网页上单击提交按钮。然后我收到此错误消息:

ERROR (bad data) in file 'weather.csv' at line 135.

Data record = [--192.168.117.2.1.4404.1368589639.796.1--]

Too few values found. Check delimiter specification.

现在,在调查我在浏览器中执行操作时发出的发布请求后,我注意到 content-type 非常具体,即:

------WebKitFormBoundaryfBp6Jfhv7LlPZLKd
Content-Disposition: form-data; name="FILENAME"; filename="weather.csv"
Content-Type: application/vnd.ms-excel

我不完全确定内容类型是导致错误的原因,但是..这是我目前正在排除的(因为我不知道到底出了什么问题。)我没有看到通过 urllib2 设置内容类型的任何方法,所以在谷歌搜索后,我偶然发现了 urllib3.

Urllib3 具有内置的文件发布功能,但我不完全确定如何使用它。

Filepost

urllib3.filepost.encode_multipart_formdata(fields, boundary=None)
Encode a dictionary of fields using the multipart/form-data MIME format.

Parameters: 
fields –
Dictionary of fields or list of (key, value) or (key, value, MIME type) field tuples. The key is treated as the field name, and the value as the body of the form-data bytes. If the value is a tuple of two elements, then the first element is treated as the filename of the form-data section and a suitable MIME type is guessed based on the filename. If the value is a tuple of three elements, then the third element is treated as an explicit MIME type of the form-data section.
Field names and filenames must be unicode.
boundary – If not specified, then a random boundary will be generated using mimetools.choose_boundary().
urllib3.filepost.iter_fields(fields)
Iterate over fields.

Supports list of (k, v) tuples and dicts.

使用此库,我尝试将值编码为文档中的描述,但出现错误。

我最初尝试过,只是为了测试一下,作为一个dict

params = {"FILENAME" : open("weather.csv", 'rb'),
            'CGIREF' : '/calludt.cgi/DDFILE1',
            'USE':'MODEL',
            'MODEL':'CM',
            'CROP':'APPLES',
            'METHOD': 'SS',
            'UNITS' : 'E',
            'LOWTHRESHOLD': '50',
            'UPTHRESHOLD': '88',
            'CUTOFF':'H',
            'COUNTY':'AL',
            'ACTIVE':'Y',
            'FROMMONTH':'3',
            'FROMDAY':'15',
            'FROMYEAR': '2013',
            'THRUMONTH':'5',
            'THRUDAY':'13',
            'THRUYEAR':'2013',
            'DATASOURCE' : 'FILE'
            }

    values = urllib3.filepost.encode_multipart_formdata(params)

但是,这会引发以下错误:

    values = urllib3.filepost.encode_multipart_formdata(params)
  File "c:\python27\lib\site-packages\urllib3-dev-py2.7.egg\urllib3\filepost.py", line 90, in encode_multipart_formdata
    body.write(data)
TypeError: 'file' does not have the buffer interface

不确定是什么原因造成的,我尝试传入一个元组列表(键、值、mimetype),但这也会引发错误:

params = [
        ("FILENAME" , open("weather_scrape.csv"), 'application/vnd.ms-excel'),
        ('CGIREF' , '/calludt.cgi/DDFILE1'),
        ('USE','MODEL'),
        ('MODEL','CM'),
        ('CROP','APPLES'),
        ('METHOD', 'SS'),
        ('UNITS' , 'E'),
        ('LOWTHRESHOLD', '50'),
        ('UPTHRESHOLD', '88'),
        ('CUTOFF','H'),
        ('COUNTY','AL'),
        ('ACTIVE','Y'),
        ('FROMMONTH','3'),
        ('FROMDAY','15'),
        ('FROMYEAR', '2013'),
        ('THRUMONTH','5'),
        ('THRUDAY','13'),
        ('THRUYEAR','2013'),
        ('DATASOURCE' , 'FILE)')
        ]

    values = urllib3.filepost.encode_multipart_formdata(params)



>>ValueError: too many values to unpack

最佳答案

如果你想为此使用 urllib3,它看起来像这样:

import urllib3

http = urllib3.PoolManager()

headers = urllib3.make_headers(user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')
url = "http://www.ipm.ucdavis.edu/WEATHER/textupload.cgi"
csv_data = open("weather_scrape.csv").read()

params = {
    "FILENAME": csv_data,
    'CGIREF': '/calludt.cgi/DDFILE1',
    'USE': 'MODEL',
    'MODEL': 'CM',
    'CROP': 'APPLES',
    'METHOD': 'SS',
    'UNITS' : 'E',
    'LOWTHRESHOLD': '50',
    'UPTHRESHOLD': '88',
    'CUTOFF': 'H',
    'COUNTY': 'AL',
    'ACTIVE': 'Y',
    'FROMMONTH': '3',
    'FROMDAY': '15',
    'FROMYEAR': '2013',
    'THRUMONTH': '5',
    'THRUDAY': '13',
    'THRUYEAR': '2013',
    'DATASOURCE' : 'FILE',
}

response = http.request('POST', url, params, headers)

我无法用你的目标 url 和 csv 数据集测试它,所以它可能有一些小错误。但这是一般的想法。

关于python - 使用 urllib3 进行多部分表单编码和发布,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16573809/

相关文章:

python - 使用 urllib2 获取错误 header

python - 使用 Python 的 SSL3 POST

python - urllib3 如何查找 Http 错误的代码和消息

python - HTTP header 被 `urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data` 切成两半

python - 使用 SSIS 包中的 Python 脚本预测数据

python - CLRS 中伪代码的问题

python - 我可以使用新目标再次加载和训练 Keras 模型吗?

python - 如何在 python 3.4 中使用 urllib2 从 API 获取信息?

python : Object not created