Pandas 数据帧 : remove � (unknown-character) from strings in rows

我已经将 csv 文件读入 python 2.7(windows 机器)。销售价格列似乎是字符串和 float 的混合体。有些行包含欧元符号 €。 Python 将 € 视为 �。

df = pd.read_csv('sales.csv', thousands=',')
print df

Gender  Size    Color   Category    Sales Price
Female  36-38   Blue    Socks       25
Female  44-46   Pink    Socks       13.2
Unisex  36-38   Black   Socks      � 19.00
Unisex  40-42   Pink    Socks      � 18.50
Female  38      Yellow  Pants      � 89,00
Female  43      Black   Pants      � 89,00

我假设一个简单的替换行就能解决它

df=df.replace('\�','',regex=True).astype(float)

但是我遇到了编码错误

SyntaxError: Non-ASCII character

希望听到您对此的看法

最佳答案

我认为@jezrael 评论是有效的。首先，您需要使用编码读取文件(请参阅编码部分下的 https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)

df=pd.read_csv('sales.csv', thousands=',', encoding='utf-8')

但要替换欧元符号，请尝试以下操作:

df=df.replace('\u20AC','',regex=True).astype(float)

关于 Pandas 数据帧 : remove � (unknown-character) from strings in rows，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45077507/

上一篇：asp.net - 在 ASP.NET Core 应用程序中使用标准 Active Directory？

下一篇：vim - Racket xrepl 的内置帮助文本(非 html)文档

相关文章：

python - 用图案而不是颜色填充多边形 Geopandas

python - 为什么python2用编码转义字符保存文本？

git - 如何在 OS X 上的 Git 中处理文件名中的亚洲字符

python:在文件中写入★

sorting - 无法正确排序泰坦尼克号数据集的 Cabin 值

pandas - 在 Azure 函数上使用 Pandas 库

python - 填充 NaN 时'numpy.float6 4' object has no attribute ' fillna'

linux - 如何在 bash 中转换文本文件？ (foo.txt 到 bar.txt)

r - 在 R 中编码，如 Python ("ord"和 "chr")

php - PHP 中的字符串文字是否只能以 ASCII 的兼容超集(例如 UTF-8 或 ISO-8859-1)的编码进行编码？