python - "# -*- coding: utf-8 -*-"，"from __future__ import unicode_literals"和 "sys.setdefaultencoding("utf 8")"有什么区别

我所知道的是:

# -*- coding: utf-8 -*-
它用于声明Python源文件的编码，一旦我设置了编码名称，Python解析器将使用给定的编码解释该文件。我称之为“文件编码”；
from __future__ import unicode_literals 我正在使用 Python2.7 执行任务，并且我使用 from __future__ import unicode_literals将字符串的默认类型从“str”更改为“unicode”。我称之为“字符串编码”；
sys.setdefaultencoding('utf8') 但有时，我在Django中会出现错误，例如我在admin中存储了中文，然后我访问了相关页面

UnicodeEncodeError at /admin/blog/vulpaper/29/change/
'ascii' codec can't encode characters in position 6-13: ordinal not in range(128)
....the more error information
The string that could not be encoded/decoded was: emcms外贸网站管理系统

对于这个问题，我会写sys.setdefaultencoding('utf8')在 Django 设置文件中解决它。

但实际上，我不知道上述的技术细节。

让我困惑的是:
1.既然我设置了python源文件编码，为什么还要设置字符串编码来确保我的字符串的编码是我最喜欢的编码？
“文件编码”和“字符串编码”有什么区别？
2.既然我设置了“文件编码”和“字符串编码”，为什么还是出现UnicodeEncodeError？

最佳答案

通常您必须同时使用文件编码和文字字符串编码，但它们实际上控制一些非常不同的东西，了解它们是有帮助的差异。

文件编码

如果您希望在源代码中的任何位置(例如注释或文字字符串)写入 unicode 字符，则需要更改编码以使 python 解析器正常工作。设置错误的编码将导致 SyntaxError 异常。 PEP 263详细解释了问题以及如何控制解析器的编码。

In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape". This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries.

...

Python will default to ASCII as standard encoding if no other encoding hints are given.

Unicode 文字字符串

Python 2 使用两种不同的字符串类型:unicode 和 str。当您定义文字字符串时，解释器实际上会创建一个 str 类型的新对象来保存该文字。

s = "A literal string"
print type(s)

<type 'str'>

TL;DR

If you want to change this behavior and instead create unicode object every time an unprefixed string literal is defined, you can use from __future__ import unicode_literals

如果您需要了解为什么这很有用，请继续阅读。

您可以使用 u 前缀将文字字符串显式定义为 unicode。解释器将为此文字创建一个 unicode 对象。

s = u"A literal string"
print type(s)

<type 'unicode'>

对于 ASCII 文本，使用 str 类型就足够了，但如果您打算操作非 ASCII 文本，则使用 unicode 类型重要使字符级操作正常工作。以下示例显示了对于完全相同的文字，使用 str 和 unicode 进行字符级别解释的差异。

# -*- coding: utf-8 -*-

def print_characters(s):
    print "String of type {}".format(type(s))
    print "  Length: {} ".format(len(s))
    print "  Characters: " ,
    for c in s:
        print c,
    print
    print


u_lit = u"Γειά σου κόσμε"
s_lit = "Γειά σου κόσμε"

print_characters(u_lit)
print_characters(s_lit)

输出:

String of type <type 'unicode'>
  Length: 14 
  Characters:  Γ ε ι ά   σ ο υ   κ ό σ μ ε

String of type <type 'str'>
  Length: 26 
  Characters:  � � � � � � � �   � � � � � �   � � � � � � � � � �

使用 str 它错误地报告它的长度为 26 个字符，并且迭代字符返回垃圾。另一方面，unicode 按预期工作。

设置sys.setdefaultencoding('utf8')

有一个nice answer在堆栈溢出中关于为什么我们不应该使用它:)

关于python - "# -*- coding: utf-8 -*-"，"from __future__ import unicode_literals"和 "sys.setdefaultencoding("utf 8")"有什么区别，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50579547/

python - "# -- coding: utf-8 --"，"from future import unicode_literals"和 "sys.setdefaultencoding("utf 8")"有什么区别

文件编码

Unicode 文字字符串

设置sys.setdefaultencoding('utf8')

上一篇：python - keras中的反卷积层

下一篇：Python Django 从 DateField 按月获取不同的查询集