python - 如何清理文本数据中的\xc2\xa0\xc2\xa0.....

当我尝试使用以下 python 代码读取文本文件时:

     with open(file, 'r') as myfile:
          data = myfile.read()

有一些奇怪的字符以\x.... 开头，它们代表什么以及如何在读取文本文件时摆脱它们？

例如

......\xc2\xa0\xc2\xa0 第 1 章 1984 年星期二\xe2\x80\x9chey， jack ，你妈妈派我来接你\xe2\x80\x9d 雅各布罗宾斯知道比接受一个陌生人的搭车，但是当他妈妈\xe2\x80\x99的 friend ronny在学校门口等他时，他很不情愿地上了车\xe2\x80\x9cm我的名字是jacob.......

最佳答案

这是 UTF-8 编码的文本。您以 UTF-8 格式打开文件。

with open(file, 'r', encoding='utf-8') as myfile:
   ...

2.x:

with codecs.open(file, 'r', encoding='utf-8') as myfile:
   ...

Unicode In Python, Completely Demystified

关于python - 如何清理文本数据中的\xc2\xa0\xc2\xa0.....，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45889265/

上一篇：python - Python 中的 zip 对象不是迭代器吗？

下一篇：python - 为什么集合操作仅在使用方法时才适用于可迭代对象？

相关文章：

python - 如何添加基于列值限制外键数量的 Django 约束？

python - 过滤列表中的列表以获取所有第二个数字

python - GoogleAppEngine 上的 Django : performance howto

python - 按表格格式化列表数据

python-2.7 - Flask Urls 中用于路由的问号

python - 如何停止(并重新启动!)Tornado 服务器？

python-3.x - 有没有办法删除Python字符串中直到某个元素的元素？

python - pandas 多索引中的上采样

python - 将 python 与 suds 用于共享点时出现错误请求

python - Keras属性错误: 'History' object has no attribute 'predict'