python - 编写在 Python 中使用的并行 C/C++ 模块的最简单方法

简短的背景(不是必须的)

我一直在努力编写一个替代方案(资源要求较低？)mean shift C++ 模块到 Scikit-Learn一个。

在 C++ 方面，我一直在使用 nanoflann用于构建和搜索 KD 树的库。

基本上我有两个 numpy 数组，我通过 Cython 传递给我的 C++ MeanShift 函数，然后返回找到的聚类中心列表。

事实证明它快了一点，大约是 7 倍(我仍在积极努力)。

我的问题:

我想并行化我的 C++ 代码中最昂贵的部分，比如用于收敛的 for 循环，但是，由于这个 C++ 模块将被导入到 python 中，我想以最安全的方式这样做& 简单的方法。

我考虑过使用 OpenMP，您有什么建议吗？

谢谢! 祝你有美好的一天。

编辑/代码片段

谢谢@bivouac0，我现在可以编译整个了。

现在我在逻辑/技术方面苦苦挣扎。让我给您写一段我想要并行化的代码。

我有一个 std::vector<std::pair<size_t, double> > > matches vector 和相当大的 double samples[N]大批。我想使用存储在 matches 中的对的第一个元素 vector 来计算更大的访问索引samples数组(见下面的代码): 这是执行此操作的方法:

typedef std::vector<std::pair<size_t, double> > searchResultPair; 
double* calcMean(size_t nMatches, searchResultPair matches,
    double* samples) {
/*
*/
double* returnArray = new double[3];
returnArray[0] = 0;
returnArray[1] = 0;
returnArray[2] = 0;
double x = y = z = 0;
for (size_t i = 0; i < nMatches; i = i + 1) {
    x = x + samples[3 * (matches[i].first)];
    y = y + samples[3 * (matches[i].first) + 1];
    z = z + samples[3 * (matches[i].first) + 2];
     }
returnArray[0] = x/nMatches;
returnArray[1] = y/nMatches;
returnArray[2] = z/nMatches;

return(returnArray);
}

有没有办法同时访问 matches[i].first变量？

我试过 #pragma omp parallel for reduction(+:x,y,z) num_threads(n_threads)但它会降低性能(1 个线程 > 2 个线程 > 4 个线程 > 8 个线程等等...)。

我的问题有意义吗？我在任何地方弄错了吗？管理并行 n_threads 可能只是一种开销团队计算部分和x,y,z , 因为 vector 中的元素连续存储...

我可以将上面的 for 循环分成 3 个部分，并尝试并行化每个部分。 这会是个好主意吗？ 那里的计算嵌套在 while 中嵌套在另一个for循环中，它是整个模块中最重要的方法。

最佳答案

关于使用 OpenMP 编译 c++ 代码的上述问题，您可能只需要包含 gomp 库。这是一个适合我的简单 setup.py 脚本...

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize, build_ext
# run as... python setup.py build_ext

ext = Extension("entity_lookup",    # name of extension
    ["src/entity_lookup.pyx", "src/EntityLookupImpl.cpp", "src/IndexDictImpl.cpp"],
    language="c++",     # this causes Pyrex/Cython to create C++ source
    #include_dirs=[...],                       
    libraries=['gomp'], # or include explicity with extra_link_args below                         
    cmdclass = {'build_ext': build_ext},
    #extra_link_args=['/usr/lib/x86_64-linux-gnu/libgomp.so.1'], # see above
    extra_compile_args=['-fopenmp', '-std=c++11']
)

setup(
    name = 'EntityLookup',
    version = 0.4,
    description = 'Package to match words and phrases to Entity labels',
    ext_modules = cythonize(ext)
)

请注意包含 gomp(又名 libgomp.so.1)。这是定义 GOMP_parallel 的地方。

要编译... python setup.py build_ext

我总是就地使用这段代码(没有安装在任何地方)，为此你需要设置一个指向已编译的 entity_lookup.so 的链接，它出现在脚本创建的“构建”目录的深处。

关于python - 编写在 Python 中使用的并行 C/C++ 模块的最简单方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48193130/

python - 编写在 Python 中使用的并行 C/C++ 模块的最简单方法

简短的背景(不是必须的)

我的问题:

编辑/代码片段

上一篇：c++ - 使用 boost::locale 进行 Unicode 字符分类

下一篇：c++ - 多次改组 vector 后读取访问冲突