python - 格式化固定宽度的字符串(unicode 和 utf8)

我需要以类似表格的格式解析和输出一些数据。输入采用 unicode 编码。这是测试脚本:

#!/usr/bin/env python

s1 = u'abcd'
s2 = u'\u03b1\u03b2\u03b3\u03b4'

print '1234567890'
print '%5s' % s1
print '%5s' % s2

在像 test.py 这样的简单调用的情况下，它按预期工作:

1234567890
 abcd
 αβγδ

但是如果我尝试将输出重定向到文件 test.py > a.txt，我会收到错误消息:

Traceback (most recent call last):
  File "./test.py", line 8, in 
    print '%5s' % s2
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128)

如果我将字符串转换为 UTF-8 编码，如 s2.encode('utf8') 重定向工作正常，但数据位置被破坏:

1234567890
 abcd
αβγδ

如何强制它在两种情况下都正常工作？

最佳答案

它归结为您的输出流编码。在这种特殊情况下，由于您使用的是 print，因此使用的输出文件是 sys.stdout。

交互模式/`stdout` 未重定向

当你在交互模式下运行 Python 时，或者当你不将 stdout 重定向到文件时，Python 使用基于环境的编码，即 locale 环境变量，如 LC_CTYPE。例如，如果您这样运行程序:

$ LC_CTYPE='en_US' python test.py
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128)

它将使用 ANSI_X3.4-1968 作为 sys.stdout(参见 sys.stdout.encoding)并失败。但是，您是否使用 UTF-8(您显然已经这样做了):

$ LC_CTYPE='en_US.UTF-8' python test.py
1234567890
 abcd
 αβγδ

您将获得预期的输出。

`stdout` 重定向到文件

当您将 stdout 重定向到一个文件时，Python 不会尝试从您的环境区域设置中检测编码，但它会检查另一个环境变量 PYTHONIOENCODING(检查源代码 initstdio() in Python/pylifecycle.c)。例如，这将按预期工作:

$ PYTHONIOENCODING=utf-8 python test.py >/tmp/output

因为 Python 将对 /tmp/output 文件使用 UTF-8 编码。

手动`stdout`编码覆盖

您还可以使用所需的编码手动重新打开 sys.stdout(检查 this 和 this SO 问题):

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

现在 print 将正确输出 str 和 unicode 对象，因为底层流编写器会将它们转换为 UTF-8 即时。

输出前手动字符串编码

当然，您也可以在输出之前手动将每个 unicode 编码为 UTF-8 str:

print ('%5s' % s2).encode('utf8')

但这既乏味又容易出错。

显式打开文件

为了完整性:在 Python 2 中打开以特定编码(如 UTF-8)写入的文件时，您应该使用 io.open 或 codecs.open，因为它们允许您指定编码(请参阅 this question)，这与内置的open:

from codecs import open
myfile = open('filename', encoding='utf-8')

或:

from io import open
myfile = open('filename', encoding='utf-8')

关于python - 格式化固定宽度的字符串(unicode 和 utf8)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45783034/

python - 格式化固定宽度的字符串(unicode 和 utf8)

交互模式/`stdout` 未重定向

`stdout` 重定向到文件

手动`stdout`编码覆盖

输出前手动字符串编码

显式打开文件

上一篇：php - - mongodb/mongodb 1.2.0 需要 ext-mongodb ^1.3.0 -> 您的系统中缺少请求的 PHP 扩展 mongodb

下一篇：linux - kubelet 失败，kubelet cgroup 驱动程序 : "cgroupfs" is different from docker cgroup driver: "systemd"

python - 格式化固定宽度的字符串(unicode 和 utf8)

交互模式/stdout 未重定向

stdout 重定向到文件

手动stdout编码覆盖

输出前手动字符串编码

显式打开文件

上一篇：php - - mongodb/mongodb 1.2.0 需要 ext-mongodb ^1.3.0 -> 您的系统中缺少请求的 PHP 扩展 mongodb

下一篇：linux - kubelet 失败，kubelet cgroup 驱动程序 : "cgroupfs" is different from docker cgroup driver: "systemd"

交互模式/`stdout` 未重定向

`stdout` 重定向到文件

手动`stdout`编码覆盖