python - 在 Pandas 数据框上应用正则表达式函数

标签 python regex pandas dataframe data-manipulation

我在 Pandas 中有一个数据框，例如:

0                       1                   2
([0.8898668778942382    0.89533945283595]   0)
([1.2632564814188714    1.0207660696232244] 0)
([1.006649166957976     1.1180973832359227] 0)
([0.9653632916751714    0.8625538463644129] 0)
([1.038366333873932     0.9091449796555554] 0)

所有值都是字符串。我想删除所有特殊字符并转换为 double 。我想应用一个函数来删除所有特殊字符，除了点像

import re
re.sub('[^0-9.]+', '',x)

所以我想在数据帧的所有单元格中应用它。我该怎么做？我找到了 df.applymap 函数，但我不知道如何将字符串作为参数传递。我试过

def remSp(x): 
    re.sub('^[0-9]+', '',x)

df.applymap(remSp())

但我不知道如何将单元格传递给函数。有没有更好的方法来做到这一点？

谢谢

最佳答案

为什么不能直接使用正则表达式在 df 上使用默认替换方法，即

df = df.replace('[^\d.]', '',regex=True).astype(float)

0 1 2
0 0.889867 0.895339 0.0
1 1.263256 1.020766 0.0
2 1.006649 1.118097 0.0
3 0.965363 0.862554 0.0
4 1.038366 0.909145 0.0

这仍然比其他答案快。

关于python - 在 Pandas 数据框上应用正则表达式函数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46113355/

上一篇：google-apps-script - 仅复制和粘贴值和格式 - Google Script

下一篇：Python 从包含多个地标的 KML 文件中提取数据

相关文章：

php - 用于电话号码验证的简单 RegEx 更改

python - 在 Pandas 中编码列标签以进行机器学习

python - 类型错误 : ufunc 'subtract' did not contain a loop with signature matching types dtype ('<U8' ) dtype ('<U8' ) dtype ('<U8' )

python - 模仿排序中的数组交换

python - 在 Python 中并行化一个简单的循环并使用 concurrent.futures 获取结果

python - 删除/重新排列/添加非常大的 tsv 文件 Python

python - 在 try/except 子句中传递字典

regex - 在 Vi 编辑器中替换多行

java - 正则表达式因 GT 符号失败

python Pandas 调用 groupby.agg 中的复杂函数