python - 为什么我收到 SyntaxError : (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte

标签 python json unicode utf-8

我从 API 获取了一些 json 数据。我使用了 json.loads,然后将其打印到 REPL,如下所示。

  {'warnings': {'query': {'*': "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}}, 'query-continue': {'links': {'plcontinue': '25618423|10|R_from_other_capitalisation', 'gplcontinue': "15095968|0|1991_US_Open_-_Women's_Doubles"}}, 'query': {'pages': {'32203010': {'pageid': 32203010, 'title': "1988 Australian Open - Women's Doubles", 'ns': 0}, '25618558': {'pageid': 25618558, 'title': "1984 Wimbledon Championships - Women's Singles", 'ns': 0}, '29486043': {'pageid': 29486043, 'title': "1984 Wimbledon Championships - Women's Doubles", 'ns': 0}, '25618819': {'pageid': 25618819, 'title': "1986 US Open - Women's Singles", 'ns': 0}, '25619314': {'pageid': 25619314, 'title': "1989 US Open - Women's Singles", 'ns': 0}, '25618668': {'pageid': 25618668, 'title': "1985 US Open - Women's Singles", 'ns': 0}, '25618857': {'pageid': 25618857, 'title': "1987 Australian Open - Women's Singles", 'ns': 0}, '25618423': {'links': [{'title': "1983 Wimbledon Championships – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}], 'pageid': 25618423, 'title': "1983 Wimbledon Championships - Women's Singles", 'ns': 0}, '23826062': {'links': [{'title': "1984 French Open – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}, {'title': 'Template:R from other capitalisation', 'ns': 10}, {'title': 'Template:R from plural', 'ns': 10}, {'title': 'Template:R from short name', 'ns': 10}, {'title': 'Category:Redirects from modifications', 'ns': 14}], 'pageid': 23826062, 'title': "1984 French Open - Women's Singles", 'ns': 0}, '25619177': {'pageid': 25619177, 'title': "1989 Australian Open - Women's Singles", 'ns': 0}}}}

然后我将该数据从 repl 复制到 .py 模块并分配给一个变量,以便我可以执行一些单元测试。但我不断收到此错误:

SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte

这是怎么回事?

更新:我收到错误的确切方式。我使用 Visual Studio 运行了一个脚本,该脚本使用请求和 .text 来获取数据以获取内容。然后我应用了 json.loads。我将其打印到 Visual Studio Python 3.4 Interactive(又名 REPL)。然后我使用鼠标从该 REPL 中复制并粘贴到 Visual Studio 中的 .py 文件中。

更新 2:因此,当我获取数据时,我使用请求,然后使用文本属性。当我在没有 json.loads 的情况下打印它时,它很好。但是,如果我从 REPL 复制这个“更原始”并粘贴它,它就不再是字符串,而是对象,并且 JSON 加载将不起作用。 python 3 print 函数是否打印对象,即使它应该是 json?

这是使用 Requests.text 的 API 的原始 no json.loads 输出:

{"warnings":{"query":{"*":"Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}},"query-continue":{"links":{"plcontinue":"25618423|10|R_from_other_capitalisation","gplcontinue":"15095968|0|1991_US_Open_-_Women's_Doubles"}},"query":{"pages":{"25618423":{"pageid":25618423,"ns":0,"title":"1983 Wimbledon Championships - Women's Singles","links":[{"ns":0,"title":"1983 Wimbledon Championships \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"}]},"23826062":{"pageid":23826062,"ns":0,"title":"1984 French Open - Women's Singles","links":[{"ns":0,"title":"1984 French Open \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"},{"ns":10,"title":"Template:R from other capitalisation"},{"ns":10,"title":"Template:R from plural"},{"ns":10,"title":"Template:R from short name"},{"ns":14,"title":"Category:Redirects from modifications"}]},"29486043":{"pageid":29486043,"ns":0,"title":"1984 Wimbledon Championships - Women's Doubles"},"25618558":{"pageid":25618558,"ns":0,"title":"1984 Wimbledon Championships - Women's Singles"},"25618668":{"pageid":25618668,"ns":0,"title":"1985 US Open - Women's Singles"},"25618819":{"pageid":25618819,"ns":0,"title":"1986 US Open - Women's Singles"},"25618857":{"pageid":25618857,"ns":0,"title":"1987 Australian Open - Women's Singles"},"32203010":{"pageid":32203010,"ns":0,"title":"1988 Australian Open - Women's Doubles"},"25619177":{"pageid":25619177,"ns":0,"title":"1989 Australian Open - Women's Singles"},"25619314":{"pageid":25619314,"ns":0,"title":"1989 US Open - Women's Singles"}}}}

最佳答案

您的文本中有 EN DASH (U+2013) 字符。在 Windows-1252 编解码器中,它们映射到字节 \x96。您遇到了编码问题,但具体原因取决于您将文本复制到 .py 文件所采取的步骤。我将问题中的文本剪切并粘贴到 Notepad++ 中,并将编码设置为 ANSI 并将其分配给一个变量,结果如下:

  File "C:\temp.py", line 1
SyntaxError: unknown decode error

但是选择 UTF-8UTF-8 without BOM 作为编码它可以正常工作。如果没有声明源编码的 #coding: 注释,Python 3 假定为 UTF-8。

请注意,我的美国 Windows 系统上的 ANSI 实际上是 Windows-1252。使用 ANSI 并添加 #coding:windows-1252 也可以正常工作。如果源编码与默认编码不同(Python 2 上的 ascii 和 Python 3 上的 utf-8),Python 需要知道源编码。

关于python - 为什么我收到 SyntaxError : (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29711124/

相关文章:

python - 尝试通过Python移动文件但无法移动文件夹

python - 是否可以在不更改同一类的所有其他实例的情况下更改实例的方法实现?

python - 如何在 Python 3.0 中输出 Unicode 符号?

c - 特定有限整数集的高效映射

javascript - 将js中的单词与非常奇怪的结果进行比较

python - 如何打印一个可以包含自身的类

python - 多个从左到右的选择字段

python - 更改 Django 表单中的字段,覆盖 clean()

javascript - 在 jquery 中按时间戳对数组进行排序

android - 如何按顺序将子项添加到 firebase 中的父项?