python - 查找包含子字符串的列并替换它 - Pandas

我在数据框中遇到特殊字符的问题。例如

ID  license     value1     value2   value3 ...
2     a       "5,120.000"    15%     45    ...
1     b       "3,246.440"    10%     65    ...
4     b       "1,890.220"    50%     10    ...
5     c       "2,005.240"    32%     12    ...

问题是我有很多列，大约 150 个，如果我逐列并单独替换值是不行的。因此，我需要替换所有特殊字符并将数字(此时 - 字符串列)转换为 float 。

我尝试过这样的事情:

def drop_percent(data):
    for el in data.columns:
        if data[el].astype(str).str.contains('%').any():
            data[el] = data[el].str.strip("%").astype(float)
    return data


def drop_commas(data):
    for el in data.columns:
        if data[el].astype(str).str.contains(',').any():
            mcd[el] = mcd[el].str.replace(',','')
        if data[el].astype(str).str.contains('"').any():
            mcd[el] = mcd[el].str.replace('"', '')
            mcd[el] = mcd[el].astype(float)

    return data

我遇到的错误是:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

最佳答案

这是使用列表理解和 str.replace 从列中删除所有非特殊字符的简单直接的方法。 :

(pd.concat([df[col].astype(str).str.replace(r'\W+', '', regex=True) 
           for col in df.columns], 1))

  license   value1 value2 value3
2       a  5120000     15     45
1       b  3246440     10     65
4       b  1890220     50     10
5       c  2005240     32     12

关于python - 查找包含子字符串的列并替换它 - Pandas，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56129184/

上一篇：python - 合并在列中迭代的两个数据帧

下一篇：python - 使用来自另一个笔记本的动态名称/字符串运行 jupyter notebook

相关文章：

python - 将 OpenCV 与 PyPy 结合使用

python - 使用 python 提取网页上的 URL 列表的简单方法是什么？

python - GEKKO 优化问题中的二元变量

java - 搜索字符串中的数据 - Java

python - pandas 将时间序列转换为多列 DataFrame

python - OpenCV 是否具有多目标跟踪功能？

java - 从二进制字符串转换时出现 NumberFormatException

c# - 使用正则表达式替换引号外的空格

python - Plotly:如何设置分组子图？

python - Matplotlib/Pandas 中条形图的优化