python - 为什么在使用 multipart/form-data 时不能正确发送带有 Unicode 的 POST 名称?

标签 python http unicode python-requests multipartform-data

我想发送一个带有文件附件的 POST 请求,尽管一些字段名称中包含 Unicode 字符。但是服务器没有正确接收它们,如下所示:

>>> # normal, without unicode
>>> resp = requests.post('http://httpbin.org/post', data={'snowman': 'hello'}, files={('kitten.jpg', open('kitten.jpg', 'rb'))}).json()['form']
>>> resp
{u'snowman': u'hello'}
>>>
>>> # with unicode, see that the name has become 'null'
>>> resp = requests.post('http://httpbin.org/post', data={'☃': 'hello'}, files={('kitten.jpg', open('kitten.jpg', 'rb'))}).json()['form']
>>> resp
{u'null': u'hello'}
>>>
>>> # it works without the image
>>> resp = requests.post('http://httpbin.org/post', data={'☃': 'hello'}).json()['form']
>>> resp
{u'\u2603': u'hello'}

我该如何解决这个问题?

最佳答案

从 wireshark 评论来看,python-requests 似乎做错了,但可能没有“正确答案”。

RFC 2388

Field names originally in non-ASCII character sets may be encoded within the value of the "name" parameter using the standard method described in RFC 2047.

RFC 2047 ,反过来说

Generally, an "encoded-word" is a sequence of printable ASCII characters that begins with "=?", ends with "?=", and has two "?"s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.

并继续描述“Q”和“B”编码方法。使用“Q”(引用打印)方法,名称将是:

=?utf-8?q?=E2=98=83?=

但是,如RFC 6266明确指出:

An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'.

所以我们不允许这样做。 (感谢@Lukasa 的这次 catch !)

RFC 2388 也说

The original local file name may be supplied as well, either as a "filename" parameter either of the "content-disposition: form-data" header or, in the case of multiple files, in a "content-disposition: file" header of the subpart. The sending application MAY supply a file name; if the file name of the sender's operating system is not in US-ASCII, the file name might be approximated, or encoded using the method of RFC 2231.

RFC 2231描述了一种看起来更像您所看到的方法。其中,

Asterisks ("*") are reused to provide the indicator that language and character set information is present and encoding is being used. A single quote ("'") is used to delimit the character set and language information at the beginning of the parameter value. Percent signs ("%") are used as the encoding flag, which agrees with RFC 2047.

Specifically, an asterisk at the end of a parameter name acts as an indicator that character set and language information may appear at the beginning of the parameter value. A single quote is used to separate the character set, language, and actual value information in the parameter value string, and an percent sign is used to flag octets encoded in hexadecimal.

也就是说,如果采用这种方法(并且两端都支持),名称应该是:

name*=utf-8''%E2%98%83

幸运的是,RFC 5987向 HTTP header 添加基于 RFC 2231 的编码! (感谢@bobince 的这个发现)它说你可以(任何人都应该)包括一个 RFC 2231 风格的值一个普通值:

Header field specifications need to define whether multiple instances of parameters with identical parmname components are allowed, and how they should be processed. This specification suggests that a parameter using the extended syntax takes precedence. This would allow producers to use both formats without breaking recipients that do not understand the extended syntax yet.

Example:

foo: bar; title="EURO exchange rates"; title*=utf-8''%e2%82%ac%20exchange%20rates

然而,在他们的示例中,他们“简化了”“遗留客户”的普通值。这实际上不是表单字段名称的选项,所以看起来最好的方法可能是同时包含 name=name*= 版本,其中普通值是(正如@bobince 所描述的那样)“只是发送字节,引用,以与表单相同的编码”,例如:

Content-Disposition: form-data; name="☃"; name*=utf-8''%E2%98%83

另见:

最后看http://larry.masinter.net/1307multipart-form-data.pdf (还有 https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909#c8 ),其中建议坚持使用 ASCII 表单字段名称来避免此问题。

关于python - 为什么在使用 multipart/form-data 时不能正确发送带有 Unicode 的 POST 名称?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20591599/

相关文章:

java - Android 应用程序在请求 HttpResponse<JsonNode> 时崩溃

python - 宽东亚字符与格式功能的对齐

python - 将 Nmap 子进程中的数据直接存储到列表中

python - 在球体表面上创建变形(使用Python?)

python - 如何在python请求库中实现重试机制?

通过 API 的 Facebook 智能好友列表

python - 使用 Unicode 发送 HTML 邮件

c++ - 从 LPTSTR 到 tstring 的转换导致运行时错误

python - Tastypie 和 JSON Field 序列化问题

python - 在计算最小二乘法时,为什么要添加 1 的向量?