python - 为什么所有这些 unicode 命令在 Python 中都能正确工作？无论我做什么，它们都正确地打印出我的字符

可能我完全不明白它，所以你能看一下代码示例并告诉我我应该做什么，以确保它能工作吗？

我在 Eclipse 中用 Pydev 尝试过。我使用 python 2.6.6(因为某些库不支持 python 2.7)。

首先，不使用编解码器模块

# -*- coding: utf-8 -*-

file1 = open("samoloty1.txt", "w")
file2 = open("samoloty2.txt", "w")
file3 = open("samoloty3.txt", "w")
file4 = open("samoloty4.txt", "w")
file5 = open("samoloty5.txt", "w")
file6 = open("samoloty6.txt", "w")

# I know that this is weird, but it shows that whatever i do, it not ruin anything...
print u"ą✈✈"
file1.write(u"ą✈✈")
print "ą✈✈"
file2.write("ą✈✈")

print "ą✈✈".decode("utf-8")
file3.write("ą✈✈".decode("utf-8"))
print "ą✈✈".encode("utf-8")
file4.write("ą✈✈".encode("utf-8"))

print u"ą✈✈".decode("utf-8")
file5.write(u"ą✈✈".decode("utf-8"))
print u"ą✈✈".encode("utf-8")
file6.write(u"ą✈✈".encode("utf-8"))

file1.close()
file2.close()
file3.close()
file4.close()
file5.close()
file6.close()

file1 = open("samoloty1.txt", "r")
file2 = open("samoloty2.txt", "r")
file3 = open("samoloty3.txt", "r")
file4 = open("samoloty4.txt", "r")
file5 = open("samoloty5.txt", "r")
file6 = open("samoloty6.txt", "r")

print file1.read()
print file2.read()
print file3.read()
print file4.read()
print file5.read()
print file6.read()

这些打印中的每一个都工作正常，而且我没有得到任何有趣的字符。

我也尝试过这个:我删除了先前测试中创建的所有文件并仅更改这些行:

file1 = open("samoloty1.txt", "w")

致那些:

file1 = codecs.open("samoloty1.txt", "w", encoding='utf-8')

一切正常......

谁能举一些例子什么有效，什么无效？

这应该是一个单独的问题吗？ 我正在通过以下方式下载网页:

content = urllib.urlopen(some_url).read()
ucontent = unicode(content, encoding) # i get encoding from headers

这是正确且足够的吗？接下来我应该怎么做才能将其存储在 utf-8 文件中？ (我问这个问题是因为无论我之前做了什么，它都有效......)

** 更新 **

可能一切正常，因为 PyDev(或只是 Eclipse)具有以 UTF-8 编码的终端。因此，对于测试，我使用了 Windows 7 中的 cmd，但出现了一些错误。现在一切都如预期般崩溃了。 :D 在这里，我展示了我为了让它再次工作而所做的更改(所有这些更改对我来说都是合理的，并且它们与我在 Python 文档的答案和文档中学到的内容一致)。

print u"ą✈✈".encode("utf-8") # added encode
file1.write(u"ą✈✈".encode("utf-8")) # added encode
print "ą✈✈"
file2.write("ą✈✈")

print "ą✈✈" # removed .decode("utf-8")
file3.write("ą✈✈") # removed .decode("utf-8"))
print "ą✈✈" # removed .encode("utf-8")
file4.write("ą✈✈") # removed .encode("utf-8"))

print u"ą✈✈".encode("utf-8") # changed from .decode("utf-8")
file5.write(u"ą✈✈".encode("utf-8")) # changed from .decode("utf-8")
print u"ą✈✈".encode("utf-8")
file6.write(u"ą✈✈".encode("utf-8"))

就像有人说的，当我使用编解码器时，我不需要每次在写入文件之前都使用encode()。 :) 问题是，哪个答案应该标记为正确？

最佳答案

您很幸运，您的控制台默认编码为 utf-8。

如果将 unicode 对象传递给文件对象 (sys.stdout) 的 write 方法，则该对象将被隐式解码为它的 encoding 属性。

那些在 Windows 中工作的人就没那么幸运了:How to workaround Python "WindowsError messages are not properly encoded" problem?

关于python - 为什么所有这些 unicode 命令在 Python 中都能正确工作？无论我做什么，它们都正确地打印出我的字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8963616/

python - 为什么所有这些 unicode 命令在 Python 中都能正确工作？无论我做什么，它们都正确地打印出我的字符

上一篇：python - 使用 Python 3to2 修复程序集时出现的问题

下一篇：Python 默认参数和参数名称

python - 为什么所有这些 un​​icode 命令在 Python 中都能正确工作？无论我做什么，它们都正确地打印出我的字符

上一篇：python - 使用 Python 3to2 修复程序集时出现的问题

下一篇：Python 默认参数和参数名称

python - 为什么所有这些 unicode 命令在 Python 中都能正确工作？无论我做什么，它们都正确地打印出我的字符