python - 解码和unicode之间的区别？

根据这个测试:

# -*- coding: utf-8 -*-

ENCODING = 'utf-8'

# what is the difference between decode and unicode?
test_cases = [
    'aaaaa',
    'ááááá',
    'ℕℤℚℝℂ',
]
FORMAT = '%-10s %5d %-10s %-10s %5d %-10s %10s'
for text in test_cases :
    decoded = text.decode(ENCODING)
    unicoded = unicode(text, ENCODING)
    equal = decoded == unicoded
    print FORMAT % (decoded, len(decoded), type(decoded), unicoded, len(unicoded), type(unicoded), equal)

.decode()和unicode()没有区别:

aaaaa          5 <type 'unicode'> aaaaa          5 <type 'unicode'>       True
ááááá          5 <type 'unicode'> ááááá          5 <type 'unicode'>       True
ℕℤℚℝℂ          5 <type 'unicode'> ℕℤℚℝℂ          5 <type 'unicode'>       True

我说的对吗？如果是这样，为什么我们有两种不同的方法来完成同一件事？我应该使用哪一个？有什么细微差别吗？

最佳答案

比较这两个函数(here 和here)的文档，这两种方法之间的差异似乎确实很小。 unicode 函数记录为

If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, ...

而 string.decode 的描述说明

Decodes the string using the codec registered for encoding. encoding defaults to the default string encoding. errors may be given to set a different error handling scheme. The default is 'strict', meaning that encoding errors raise UnicodeError. ...

因此，唯一的区别似乎是 unicode 也适用于字符缓冲区，并且为无效输入返回的错误不同(ValueError 与 UnicodeError)。另一个细微差别是向后兼容性:unicode 被记录为“2.0 版中的新功能”，而 string.decode 是“2.2 版中的新功能”。

鉴于以上情况，使用哪种方法似乎完全是个人喜好问题。

关于python - 解码和unicode之间的区别？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20656096/

python - 解码和unicode之间的区别？

上一篇：python - 在任意索引处有效地划分字符串

下一篇：python - 如何将 xmlrpc 服务器的日志输出重定向到某个文件