python - 解码函数尝试对 Python 进行编码

我正在尝试打印一个没有特定十六进制编码的 unicode 字符串。我正在从 facebook 获取此数据，该数据在 UTF-8 的 html header 中具有编码类型。当我打印类型时 - 它说它的 unicode，但是当我尝试用 unicode-escape 解码它时说存在编码错误。为什么我使用 decode 方法时它会尝试编码？

代码

a='really long string of unicode html text that i wont reprint'
print type(a)
 >>> <type 'unicode'>   
print a.decode('unicode-escape')
 >>> Traceback (most recent call last):
  File "scfbp.py", line 203, in myFunctionPage
    print a.decode('unicode-escape')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 1945: ordinal not in range(128)

最佳答案

失败的不是解码。这是因为您正试图将结果显示到控制台。当您使用 print 时，它使用默认编码 ASCII 对字符串进行编码。不要使用打印，它应该可以工作。

>>> a=u'really long string containing \\u20ac and some other text'
>>> type(a)
<type 'unicode'>
>>> a.decode('unicode-escape')
u'really long string containing \u20ac and some other text'
>>> print a.decode('unicode-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 30: ordinal not in range(128)

我建议使用 IDLE 或其他可以输出 unicode 的解释器，这样你就不会遇到这个问题。

更新:请注意，这与少一个反斜杠的情况不同，后者在解码过程中失败，但具有相同的错误消息:

>>> a=u'really long string containing \u20ac and some other text'
>>> type(a)
<type 'unicode'>
>>> a.decode('unicode-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 30: ordinal not in range(128)

关于python - 解码函数尝试对 Python 进行编码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4799917/

python - 解码函数尝试对 Python 进行编码

上一篇：python - 如何使 simplejson 可序列化类

下一篇： python Nose : assertion library?