python - json 网页的 Unicode 解码错误

标签 python json unicode python-requests python-3.4

按照事情的方式，我在 python 中解码一些 unicode 时遇到问题

具体来说，这个网页:xkcd.com/403/info.0.json

相关部分是Paul Erd\u00c5\u0091s!

当我通过 json 解码器运行它时，unicode 被解码，但没有使用正确的编解码器

我目前正在使用单行:

requests.get("http://xkcd.com/403/info.0.json").json()["alt"][-12:]

得到'Paul ErdÅ\x91s!'这显然不是我想要的

有什么想法可以解决这个问题吗？

最佳答案

要修复该 JSON，您需要编码为 Latin-1(因为它会天真地转码字节)，然后从 UTF-8 解码。

两次。因为它是双断的。

>>> json.loads('"Erd\u00c3\u0085\u00c2\u0091s!"')
u'Erd\xc3\x85\xc2\x91s!'
>>> json.loads('"Erd\u00c3\u0085\u00c2\u0091s!"').encode('latin-1').decode('utf-8')
u'Erd\xc5\x91s!'
>>> json.loads('"Erd\u00c3\u0085\u00c2\u0091s!"').encode('latin-1').decode('utf-8').encode('latin-1').decode('utf-8')
u'Erd\u0151s!'
>>> print json.loads('"Erd\u00c3\u0085\u00c2\u0091s!"').encode('latin-1').decode('utf-8').encode('latin-1').decode('utf-8')
Erdős!

关于python - json 网页的 Unicode 解码错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24876817/

上一篇：python - 使用带有 lxml 前缀的 fromstring()

下一篇：python - 接收组播: Does TTL matter?

带有 Twisted 的 Python Web 服务

javascript - 如何从给定的 json 中检索元素

python - 如何使用 python 动态展平深度嵌套的 json 文件？

python - 在 Python 请求中使用 JSON 变量

python - 使用 SWIG 绑定(bind) Python/C++ 模板

python - 将 "facet_filter"查询转换为 pyes 格式

delphi - Delphi 2009 及更高版本中的捕获控制台

django - 在 Linux 中使用特殊字符时 Django Admin 出现 UnicodeDecode 错误(使用 postgresql)

python - Linux/Python : encoding a unicode string for print