python 3 : Read UTF-8 file containing German umlaut

我搜索并发现了许多类似的问题和文章，但没有一个能让我解决这个问题。

我在 Windows 10 上使用 Python 3.5.0(v3.5.0:374f501f4567，2015 年 9 月 13 日，02:27:37)[MSC v.1900 64 位 (AMD64)]。

我有一个简单的文本文件，它是为 Windows 使用 UTF-8 编码的，如下所示:

我想要做的就是将这个文件的内容读入 Python 字符串并在标准控制台中正确显示它。

这是第一次失败的尝试:

    file_name=r'c:\temp\encoding_test.txt'
    fh=open(file_name,'r')
    f_str=fh.read()
    fh.close()
    print(f_str)

打印语句引发异常:

'charmap' codec can't encode character '\u201e' in position 100: character maps to undefined

使用调试器，f_str 包含以下内容:

'I would like the following characters to display correctly after reading this file into Python:\n\nÃ„Ã–ÃœÃ¤Ã¶Ã¼ÃŸ\n'

这已经让我很费解了。 Python 3 不是到处都默认使用 UTF-8 吗？还有什么其他编码可以工作？我尝试了所有 Notepad++ 支持的功能，但都没有用。

好的，有点复杂，我试过了:

    import codecs
    file_name=r'c:\temp\encoding_test.txt'
    my_encoding='utf-8'
    fh=codecs.open(file_name,'r',encoding=my_encoding)
    f_str=fh.read().encode(my_encoding)
    fh.close()
    print(f_str)

这至少不会引发异常，但会产生

b'I would like the following characters to display correctly after reading this file into Python:\r\n\r\n\xc3\x84\xc3\x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n' I

这对我来说完全是一团糟。这里有人可以帮我解决这个问题吗？

最佳答案

接受的答案太复杂了。您只需要为打开指定编码:

fh = open(file_name, encoding='utf8')

一切正常。

你另一个问题的答案:

Doesn't Python 3 use UTF-8 as a default everywhere?

“不是在与外部世界(在本例中为文件系统)通信时，因为它会与您的操作系统不一致”。规范说用户的首选编码取决于语言环境。做

>>> import locale
>>> locale.getpreferredencoding()

查看它在您的系统上是什么 - 在 Windows 上很可能是“cp something”，具体取决于确切的默认代码页集。但是您始终可以使用 open 的显式 encoding 参数覆盖。

在那里，我希望你学到了一些新东西。 :-)

关于 python 3 : Read UTF-8 file containing German umlaut，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36242200/

python 3 : Read UTF-8 file containing German umlaut

上一篇：python - 在 MacOS 上每小时执行一次 Python 脚本

下一篇：python - 模板语法错误 : 'with' expected with atleast one variable assignment