python - 如何增加脚本运行时迭代的数组数量？

我的脚本清除数组中不需要的字符串，例如“@#$!”和其他东西。该脚本按预期工作，但当 Excel 行大小较大时，速度非常慢。

我尝试使用 numpy 如果它可以加快速度，但我不太熟悉它，所以我可能使用不正确。

xls = pd.ExcelFile(path)
df = xls.parse("Sheet2")

TeleNum = np.array(df['telephone'].values)

def replace(orignstr):  # removes the unwanted string from numbers
    for elem in badstr:
        if elem in orignstr:
            orignstr = orignstr.replace(elem, '')
    return orignstr


for UncleanNum in tqdm(TeleNum):
    newnum = replace(str(UncleanNum))  # calling replace function
    df['telephone'] = df['telephone'].replace(UncleanNum, newnum)  # store string back in data frame

我还尝试删除该方法是否有帮助，并将其作为一个代码块放置，但速度保持不变。

for UncleanNum in tqdm(TeleNum):
    orignstr = str(UncleanNum)
    for elem in badstr:
        if elem in orignstr:
            orignstr = orignstr.replace(elem, '')
            print(orignstr)
    df['telephone'] = df['telephone'].replace(UncleanNum, orignstr)
TeleNum = np.array(df['telephone'].values)

当前脚本运行 20 万个 Excel 文件的速度约为 70it/s，大约需要一个小时才能完成。这不太好，因为这只是众多功能中的一个。

我对 python 不太了解。我只是在编写脚本时学习，因此如果您有任何指示，我们将不胜感激。

编辑:

我处理的大多数数组元素都是数字，但有些元素中包含字符串。我试图删除数组元素中的所有字符串。

例如。

FD3459002912
*345*9002912$

最佳答案

如果您尝试清除字符串中非数字的所有内容，您可以直接使用 re.sub，如下所示:

import re

string = "FD3459002912"
regex_result = re.sub("\D", "", string)
print(regex_result) # 3459002912

关于python - 如何增加脚本运行时迭代的数组数量？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57390555/

上一篇：python - 如何选择列表元素并将其转换为 "float"？

下一篇：python - 如果字符串中存在现有模式，则匹配模式

相关文章：

vba - 使用 VBA 分发 Excel 电子表格的最佳方式

javascript - jQuery 从 Excel 复制到多个输入字段

python - 为什么它只打印 6 个字段而不是 7 个 python

python - 在此示例中，带逗号的 for 循环如何工作？

python - 从一个大列表内的多个字典中提取(间隔)值，并将这些值与另一个大列表内的相应列表组合

python - 如何在 Python 中连接列表和数据框以创建字典

python - 将包含列表的 python 2d 列表写入 parquet 文件

python - urllib 模块错误!属性错误 : 'module' object has no attribute 'request'

excel - 检查单元格值是否为整数，如果不是则添加消息

python - 如何通过 DataFrame 压扁 Pandas group？