(此问题与 this one 相关)
看一下以下 session :
Python 2.7.3 (default, Jan 2 2013, 16:53:07)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import simplejson as json
>>>
>>> my_json = '''[
... {
... "id" : "normal",
... "txt" : "This is a normal entry"
... },
... {
... "id" : "αβγδ",
... "txt" : "This is a unicode entry"
... }
... ]'''
>>>
>>> cache = json.loads(my_json, encoding='utf-8')
>>>
>>> cache
[{'txt': 'This is a normal entry', 'id': 'normal'}, {'txt': 'This is a unicode entry', 'id': u'\u03b1\u03b2\u03b3\u03b4'}]
为什么 json 解码器有时生成 unicode,有时生成纯字符串?它不是应该生成 始终 unicode 吗?
最佳答案
似乎是simplejson中的优化,来自simplejson docs :
If s is a str then decoded JSON strings that contain only ASCII characters may be parsed as str for performance and memory reasons. If your code expects only unicode the appropriate solution is decode s to unicode prior to calling decode.
注意: ASCII 中包含的任何字符在 UTF-8 和 ASCII 中的编码都是相同的。所以 ASCII 是 UTF-8 的子集。
关于python - 当存在 unicode 数据时,Json 解码器不一致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19701806/