python - 在 Python、Excel 中用 ' 字符编码令人头疼

我正在将一个 txt 文件读入 Python，提取其中的一部分，然后将结果输出为 CSV。

问题是，我遇到了不知道如何解决的编码问题。这是发生了什么:

然后我像这样将其读入 Python

inputfile=codecs.open(inputfile, "r", "utf-8")

我运行一个正则表达式来提取它的部分内容，将它们制作成 pandas DataFrame(此处称为“dataframe”)。
然后它将数据帧写为 csv 文件，但无论我怎么做，我都会遇到问题。我试过了
```
outputfile=codecs.open(outputfile, "w", "utf-8")
dataframe.to_csv(outputfile, encoding="utf-8")
```

但这给了我一个

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23:  ordinal not in range(128)

问题:

这是我不明白的第一件事:如果我正在设置 to_csv encoding="utf-8"，为什么会涉及“ascii”编解码器？根据docs , 这是一个

A string representing the encoding to use in the output file, defaults to ‘ascii’ on Python 2...

我可以通过在 codecs.open() 中不指定“utf-8”来避免这个错误。但是，一旦我将文件导入 Excel(将导入设置为“Unicode-Utf-8”)，所有 ' 字符都显示为 __。据我所知，没有其他编码错误，如果我在 TextWrangler 中打开 csv 文件，一切正常。

我在 Mac 上使用 Python 2。我没有使用 Python csv 模块，因为它在没有解决方法的情况下不会执行 UTF。

感谢任何帮助!

编辑: 这是输入文件在 WordWrangler 中的样子:

23 It’s lying down there on the floor.

这是Excel:

It__s lying down there on the floor.

在 Fawful 的帮助评论之后，我也尝试在 Excel 中打开原始文本文件。似乎它已经在那个编码中将 ' 编码为 __ 。

最佳答案

这不是一个干净的解决方案，但为了快速修复，只需使用 .replace('\0xe2', ' ')。

关于python - 在 Python、Excel 中用 ' 字符编码令人头疼，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38059812/