python - 源编码如何应用于字符串文字？

PEP-263指定按以下顺序应用源中指定的编码:

read the file

decode it into Unicode assuming a fixed per-file encoding

convert it into a UTF-8 byte string

tokenize the UTF-8 content

compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding

所以，如果我使用这段代码:

print 'abcdefgh'
print u'abcdefgh'

并将其转换为 ROT-13:

# coding: rot13

cevag 'nopqrstu'
cevag h'nopqrstu'

我希望它首先被解码然后变得与原始打印相同:

abcdefgh
abcdefgh

但是，它会打印:

nopqrstu
abcdefgh

因此，unicode 文字按预期工作，但 str 仍未转换。 为什么？

排除一些可能性:

我确认问题不在后期阶段(打印到控制台)，而是在解析时立即出现，因为这段代码产生了 “ValueError: unsupported format character 'q' (0x71) at index 1”:

x = '%q' % 1  # that is %d !

最佳答案

我想最后一点实际上非常准确地解释了发生的事情:

compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding

在前 4 个步骤之后，源文件的内容是以下字符串的标记化 unicode 版本:

print 'abcdefgh'
print u'abcdefgh'

之后，在第 5 步中，字符串对象 'abcdefgh' 使用给定的文件编码(即 rot13)重新编码为 8 位字符串数据，因此内容变为:

print 'nopqrstu'
print u'abcdefgh'

关于python - 源编码如何应用于字符串文字？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39847215/

python - 源编码如何应用于字符串文字？

上一篇：python - 如何禁用 Jupyter notebook matplotlib plot inline？

下一篇：python - Pandas to_sql() 性能 - 为什么这么慢？