假设我们有两个嵌套循环。内循环应该是并行的,但是外循环需要顺序执行。然后下面的代码就完成了我们想要的:
for (int i = 0; i < N; ++i) {
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) {
// Do some work
}
}
现在假设每个线程都必须获取一些线程局部对象来执行内部循环中的工作,并且获取这些线程局部对象的代价很高。因此,我们不想做以下事情:
for (int i = 0; i < N; ++i) {
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) {
ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
// Do some work with the help of obj
}
}
我该如何解决这个问题?
每个线程应该只请求一次本地对象。
内部循环应该在所有线程之间并行化。
外循环的迭代应该一个接一个地执行。
我的想法是这样的,但是它真的想要我想要的吗?
#pragma omp parallel
{
ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
for (int i = 0; i < N; ++i) {
#pragma omp for schedule(static)
for (int j = first(i); j < last(i); ++j) {
// Do some work with the help of obj
}
}
}
最佳答案
当您可以简单地使用 pool of objects 时,我真的不明白为什么 threadprivate
的复杂性是必要的.基本思路应遵循以下思路:
#pragma omp parallel
{
// Will hold an handle to the object pool
auto pool = shared_ptr<ObjectPool>(nullptr);
#pragma omp single copyprivate(pool)
{
// A single thread creates a pool of num_threads objects
// Copyprivate broadcasts the handle
pool = create_object_pool(omp_get_num_threads());
}
for (int i = 0; i < N; ++i)
{
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j)
{
// The object is not re-created, just a reference to it
// is returned from the pool
auto & r = pool.get( omp_get_thread_num() );
// Do work with r
}
}
}
关于c++ - 使用 OpenMP 并行化内部循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29610417/