根据这个测试:
# -*- coding: utf-8 -*-
ENCODING = 'utf-8'
# what is the difference between decode and unicode?
test_cases = [
'aaaaa',
'ááááá',
'ℕℤℚℝℂ',
]
FORMAT = '%-10s %5d %-10s %-10s %5d %-10s %10s'
for text in test_cases :
decoded = text.decode(ENCODING)
unicoded = unicode(text, ENCODING)
equal = decoded == unicoded
print FORMAT % (decoded, len(decoded), type(decoded), unicoded, len(unicoded), type(unicoded), equal)
.decode()
和unicode()
没有区别:
aaaaa 5 <type 'unicode'> aaaaa 5 <type 'unicode'> True
ááááá 5 <type 'unicode'> ááááá 5 <type 'unicode'> True
ℕℤℚℝℂ 5 <type 'unicode'> ℕℤℚℝℂ 5 <type 'unicode'> True
我说的对吗?如果是这样,为什么我们有两种不同的方法来完成同一件事?我应该使用哪一个?有什么细微差别吗?
最佳答案
比较这两个函数(here 和here)的文档,这两种方法之间的差异似乎确实很小。 unicode
函数记录为
If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, ...
而 string.decode
的描述说明
Decodes the string using the codec registered for encoding. encoding defaults to the default string encoding. errors may be given to set a different error handling scheme. The default is 'strict', meaning that encoding errors raise UnicodeError. ...
因此,唯一的区别似乎是 unicode
也适用于字符缓冲区,并且为无效输入返回的错误不同(ValueError
与 UnicodeError
)。另一个细微差别是向后兼容性:unicode
被记录为“2.0 版中的新功能”,而 string.decode
是“2.2 版中的新功能”。
鉴于以上情况,使用哪种方法似乎完全是个人喜好问题。
关于python - 解码和unicode之间的区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20656096/