python - Python 2.7 中的 UnicodeDecodeError

我正在尝试在 python 中读取一个 utf-8 编码的 xml 文件，我正在对从文件中读取的行进行一些处理，如下所示:

next_sent_separator_index =  doc_content.find(word_value, int(characterOffsetEnd_value) + 1)

其中 doc_content 是从文件中读取的行，word_value 是来自同一行的字符串之一。每当 doc_content 或 word_value 有一些 Unicode 字符时，我就会收到上一行的编码相关错误。因此，我尝试首先使用 utf-8 解码(而不是默认的 ascii 编码)对它们进行解码，如下所示:

next_sent_separator_index =  doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)

但我仍然得到 UnicodeDecodeError 如下:

Traceback (most recent call last):
  File "snippetRetriver.py", line 402, in <module>
    sentences_list,lemmatised_sentences_list = getSentenceList(form_doc)
  File "snippetRetriver.py", line 201, in getSentenceList
    next_sent_separator_index =  doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128)

任何人都可以建议我一种合适的方法/方法来避免 python 2.7 中的此类编码错误吗？

最佳答案

codecs.utf_8_decode(input.encode('utf8'))

关于python - Python 2.7 中的 UnicodeDecodeError，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10872208/

上一篇：Python webapp2 : redirect back

下一篇：python - 如何使用 python 的 GAE 开发服务器测试台模拟文件上传到 blobstore

python - 如何按照 python 中 matplotlib 中收到的顺序对条形进行排序？

Python Selector(URL 路由库)，经验/意见？

python - NumPy 或 Pandas : Keeping array type as integer while having a NaN value

javascript - 我们可以在 Javascript 中将 Unicode 转换为 ASCII 吗？ charCodeAt() 仅适用于 Unicode？

python - 使用文件系统编码对 unicode 路径进行编码会破坏它

python - 如何在ubuntu/linux中添加人脸识别登录？

python - 插入mysql语法错误

python - 具有精确的移动距离和方向变化如何填充OpenCV透视变换矩阵(disparity-to-depth)？

c# - 以非 ASCII 语言编码