python - 高效地多重处理字符串数组

标签 python arrays string performance multiprocessing

我有一个需要处理的字符串数组。由于字符串可以独立处理，因此我并行执行此操作:

import multiprocessing
import numpy as np

def func(x):
    ls = ["this", "is"]
    return [i.upper() for i in x.split(' ') if i not in ls]

arr = np.asarray(["this is a test", "this is not a test", "see my good example"])
pool = multiprocessing.Pool(processes=2)
tst = pool.map(func, arr)
pool.close()

我的问题如下:是否有任何明显的方法可以在减少内存使用和 CPU 时间方面改进我的代码？比如

在 func 中使用 numpy 数组？
使用 Python 列表而不是 numpy 数组？
...？

最佳答案

您可以使用 numpy frompyfunc对整个执行进行矢量化。这比原生 Python 实现快得多。

import numpy as np
import functools


def func(x):    
    ls = ["this", "is"]
    print( [i.upper() for i in x.split(',') if i not in ls])


x = np.array(["this is a test", "this is not a test", "see my good example"])
np.frompyfunc(func,1,1)(x)

关于python - 高效地多重处理字符串数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55948383/

上一篇：python - 使用 Beautiful Soup 和 Python 抓取 Asp.NET 网站

下一篇：python - 在 matplotlib 中编辑轴线

相关文章：

c# - 去除 Excel 工作表的非法字符

java - 有人可以解释在 String 类中被覆盖的 toString 方法的源代码吗？

python消耗内存时使用系统cpu时间

python - 合并 itertools.product 的结果？

javascript - JS - 在对象数组上使用 .every 方法

string - 如何在 Bash 中将字符串从大写转换为小写？

python - 如何根据从另一个表数据获取的id来获取表数据？ Django

python - Pylint 的 "Too few public methods"消息是什么意思？

c++ - 使用 constexpr 定义并声明 const std::array if

c - C 中的随机排列数组