python - python2内部如何处理字符串和unicode？

我对 python 的 unicode/str 进程感到困惑。我在python2中遇到过一些情况。

下面这句话是在IDE pycharm中用utf8编码写在py文件中的。

print "hello!%s"% u"中国"
print "hello!%s"% "中国"
print u"hello!%s"% "中国"

仅情况3引发解码错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128).

有人能告诉我Python是如何处理这句话的吗？为什么会有这样的结果？

最佳答案

如果删除打印语句，您可以看到更多详细信息:

>>> "hello! %s" % u"中国"
u'hello! \u4e2d\u56fd'
>>> "hello! %s" % "中国"
'hello! \xe4\xb8\xad\xe5\x9b\xbd'
>>> u"hello! %s" % "中国"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

这给了我们线索。每当涉及到任何 unicode 字符串时，Python 都会尝试将另一端转换为 unicode；并且，像往常一样，如果没有任何相反的指示，它将始终假定编码是 ASCII。

在第一种情况下，它尝试将“hello”字节串转换为unicode；由于没有非 ASCII 字符，因此可以正常工作，并且可以安全地使用现有的 unicode 字符串对结果进行插值。

在第二种情况下，两边都是字节串，因此不尝试转换；结果仍然是一个字节串。

在第三种情况下，“hello”已经是unicode，因此它尝试转换另一端；但由于这些是非 ASCII 字符，因此失败。但是，直接指定编码确实有效:

>>> u"hello! %s" % "中国".decode('utf-8')
u'hello! \u4e2d\u56fd'

关于python - python2内部如何处理字符串和unicode？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35792962/

python - python2内部如何处理字符串和unicode？

上一篇：python - 如何验证 Python 脚本的语法？

下一篇：python - 使用标签或 href 传递的 Django 数据