从文本文件读取时的 Python 编码问题

我正在读取一个包含单个单词 B\xc3\xa9zier 的文本文件。

我希望将其转换为等效的解码 utf-8 格式，即 Bézier 并将其打印到控制台。

我的代码如下:

foo=open("test.txt")  
for line in foo.readlines():  
    for word in line.split():  
        print(word.decode('utf-8'))
foo.close()

输出是:

B\xc3\xa9zier

但是如果我这样做:

>>> print('B\xc3\xa9zier'.decode('utf-8'))

我得到了正确的输出:

Bézier

我无法弄清楚为什么会这样？

最佳答案

好像你在文件中有一个原始的 utf8 转义字符串，使用 string_escape 来解码它

with open('test.txt') as f:
    for line in f:
        for word in line.split():
            print(word.decode('string_escape').decode('utf-8'))


Bézier

关于从文本文件读取时的 Python 编码问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16916261/

上一篇：Python:2.4 格式字符串不工作

下一篇：python - 如何使 rp 和 xrange 动态？

ruby-on-rails - 如何在 Rails 中的 CSV 解析期间更改编码

ruby - 打包的 Ruby 字符串中的奇怪行为

linux - 设置 bash shell 或命令提示符以运行 Python 项目

Python "in"range() 上的运算符时间复杂度

python - 当组已知时从 numpy ndarray 中选择行

python - 无法使用 STOMP 将消息发送到 activemq

python - 用户输入的输出字典不正确

python - 发现集群后对其附近/内部进行有效过滤 - python

linux - iconv 命令不会将纯文本文件的编码更改为另一种编码