Python:字节到带有重音字符的字符串

我有 git 将文件名“ùàèòùèòùùè.txt”读取为简单的字节字符串，因此当我向 git 请求已提交文件的列表时，我会得到以下字符串:

r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"

如何使用 Python 2 将其返回到“ùàèòùèòùùè.txt”？

最佳答案

如果 git 格式包含文字 \ddd 序列(因此每个文件名字节最多 4 个字符)，您可以使用 string_escape (Python 2) 或 unicode_escape (Python 3) 编解码器，让 Python 解释转义序列。

您将获得 UTF-8 数据；我的终端设置为直接解释 UTF-8:

>>> git_data = r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
>>> git_data.decode('string_escape')
'\xc3\xb9\xc3\xa0\xc3\xa8\xc3\xb2\xc3\xb9\xc3\xa8\xc3\xb2\xc3\xb9\xc3\xb9\xc3\xa8.txt'
>>> print git_data.decode('string_escape')
ùàèòùèòùùè.txt

您想要将其解码为 UTF-8 以获取文本:

>>> git_data.decode('string_escape').decode('utf8')
u'\xf9\xe0\xe8\xf2\xf9\xe8\xf2\xf9\xf9\xe8.txt'
>>> print git_data.decode('string_escape').decode('utf8')
ùàèòùèòùùè.txt

在 Python 3 中，unicode_escape 编解码器为您提供 (Unicode) 文本，因此需要对 Latin-1 进行额外编码才能使其再次成为字节:

>>> git_data = rb"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
>>> git_data.decode('unicode_escape').encode('latin1').decode('utf8')
'ùàèòùèòùùè.txt'

请注意，解码前 git_data 是一个 bytes 对象。

关于Python:字节到带有重音字符的字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30890099/

上一篇：Python:使用 pandas 导入 csv。尝试绘制一列，但出现错误，提示 "no numerical data to plot"

下一篇：python - 编写函数来定义类属性

python - utf-8 中的汉字字符

python - 从 JSON 写入 CSV 时出现 UnicodeEncodeError

Python 2.7 unicode 再次困惑

python - 写入 CSV，为空字符串获取 "Error: need to escape"

python - 为 db tableD Django 创建别名

python - lxml 中 cssselect 的 XHTML 命名空间问题

python - 如何解码转义的 Unicode 字符？

python - 如何检查字符串是 unicode 还是 ascii？

android - 在 Android axios (XMLHttpRequest) 上使用阿拉伯语和波斯语在 React Native 中损坏的字符