python - 在 Python 2.6 中使用 unicode_literals 的任何陷阱？

我们已经让我们的代码库在 Python 2.6 下运行。为了准备 Python 3.0，我们开始添加:

from __future__ import unicode_literals

到我们的 .py 文件中(当我们修改它们时)。我想知道是否有其他人一直在这样做并且遇到了任何不明显的问题(可能是在花费大量时间调试之后)。

最佳答案

我在处理 unicode 字符串时遇到的主要问题是当您将 utf-8 编码字符串与 unicode 字符串混合时。

例如，考虑以下脚本。

两个.py

# encoding: utf-8
name = 'helló wörld from two'

一个.py

# encoding: utf-8
from __future__ import unicode_literals
import two
name = 'helló wörld from one'
print name + two.name

运行python one.py的输出为:

Traceback (most recent call last):
  File "one.py", line 5, in <module>
    print name + two.name
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

在这个例子中，two.name 是一个 utf-8 编码的字符串(不是 unicode)，因为它没有导入 unicode_literals 和 one.name 是一个 unicode 字符串。当您混合使用两者时，python 会尝试解码编码的字符串(假设它是 ascii)并将其转换为 unicode 并失败。如果你这样做 print name + two.name.decode('utf-8') 会起作用。

如果您对字符串进行编码并稍后尝试混合它们，也会发生同样的事情。例如，这有效:

# encoding: utf-8
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

输出:

DEBUG: <html><body>helló wörld</body></html>

但是在添加 import unicode_literals 之后它不会:

# encoding: utf-8
from __future__ import unicode_literals
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

输出:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    print 'DEBUG: %s' % html
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

失败是因为 'DEBUG: %s' 是一个 unicode 字符串，因此 python 尝试解码 html。修复打印的几种方法是执行 print str('DEBUG: %s') % html 或 print 'DEBUG: %s' % html.decode('utf-8 ').

我希望这可以帮助您了解使用 unicode 字符串时的潜在问题。

关于python - 在 Python 2.6 中使用 unicode_literals 的任何陷阱？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/809796/

python - 在 Python 2.6 中使用 unicode_literals 的任何陷阱？

上一篇：python - 如何逐行分析 Python 代码？

下一篇：python - Pandas 加入问题 : columns overlap but no suffix specified