python - Pandas 数据框 strip 非数字字符

标签 python pandas

我正在处理以下形式的数据:

Accuracy 26.15%, error rate 0.00%, not classified 73.85%
Accuracy 29.68%, error rate 0.00%, not classified 70.32%
Accuracy 33.98%, error rate 0.00%, not classified 66.02%
Accuracy 35.34%, error rate 0.00%, not classified 64.66%
Accuracy 35.75%, error rate 0.00%, not classified 64.25%
Accuracy 37.51%, error rate 0.00%, not classified 62.49%
Accuracy 38.63%, error rate 0.00%, not classified 61.37%
Accuracy 40.81%, error rate 0.00%, not classified 59.19%
Accuracy 41.22%, error rate 0.00%, not classified 58.78%
Accuracy 41.99%, error rate 0.00%, not classified 58.01%
Accuracy 42.34%, error rate 0.00%, not classified 57.66%
Accuracy 42.40%, error rate 0.00%, not classified 57.60%
Accuracy 43.05%, error rate 0.00%, not classified 56.95%
Accuracy 44.29%, error rate 0.00%, not classified 55.71%
Accuracy 44.35%, error rate 0.00%, not classified 55.65%
Accuracy 44.76%, error rate 0.00%, not classified 55.24%
Accuracy 45.29%, error rate 0.00%, not classified 54.71%
Accuracy 45.35%, error rate 0.00%, not classified 54.65%
Accuracy 95.35%, error rate 4.24%, not classified 0.41%
Accuracy 95.76%, error rate 4.24%, not classified 0.00%
Stats on test data
Accuracy 94.74%, error rate 5.26%, not classified 0.00%

我如何将其加载到 pandas 数据框中,标题为“准确性”、“错误率”和“未分类”,同时从数据字段中删除非数字字符。

到目前为止我有:

pd.read_csv("test.csv", names=['Accuracy', 'Error rate', 'Not classified'])

但这会产生:

    Accuracy    Error rate  Not classified
0   Accuracy 25.85% error rate 0.00%    not classified 74.15%
1   Accuracy 29.92% error rate 0.00%    not classified 70.08%
2   Accuracy 33.69% error rate 0.00%    not classified 66.31%
3   Accuracy 36.16% error rate 0.00%    not classified 63.84%
4   Accuracy 37.16% error rate 0.00%    not classified 62.84%
5   Accuracy 39.28% error rate 0.00%    not classified 60.72%
6   Accuracy 39.58% error rate 0.00%    not classified 60.42%
7   Accuracy 40.05% error rate 0.00%    not classified 59.95%

最佳答案

您可以使用 pandas.DataFrame.replace() 来做到这一点:

df.replace(r'[a-zA-Z%]', '', regex=True, inplace=True)

如果您的最终目标是将这些值转换为数字执行

df.apply(pd.to_numeric)

或者逐列进行

df['Accuracy'] = pd.to_numeric(df['Accuracy']) # and so on

关于python - Pandas 数据框 strip 非数字字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53683479/

相关文章:

python - 使用 Spacy 提取动词短语

python - python 中的多线程 : when does a thread become inactive?

python - 基于百分位数的类别分配

python - Panda 的数据框将一列拆分为多列

python - pandas dataframe - 如果有新索引则添加新行,如果存在则用列数据补充索引

python - for 循环中 pandas 中的新列

python - 有没有办法在进口商中获得进口商的变量?

python - 如何将新的键值对添加到字典列表中的现有键值对?

python - 从 Pandas DataFrame 创建 NetowrkX 图

python - 在 OOP Python tkinter 中的多个类中使用具有相同参数的一个函数的最佳方法是什么