c++ - 使用 OpenMP 并行运行的最简单示例

标签 c++ c multithreading parallel-processing openmp

考虑以下代码构造,

int n = 0;

  #pragma omp parallel for collapse(2)
  for (int i = 0; i < 3; i++)
     for(int j = 0; j < 3; j++)
       n++;

现在上面是我试图在需要大量时间的代码中实现的类似事情的最简单的演示。因此,主要目标是并行化循环,以减少运行时间。

我是 OpenMP 新手,只知道一些命令,仅此而已。现在,在我上面编写的代码中,最终结果是错误的(n = 9 是正确答案)。我猜想,循环正在尝试同时访问相同的内存位置。

现在有人可以给出一个最简单的解决方案吗?请注意,我对此非常菜鸟。任何与此相关的阅读 Material 也会有所帮助。谢谢。

最佳答案

I guess, the loops are trying to access the same memory location simultaneuouly.

TL,DR :是的,在变量 n 更新期间存在竞争条件 。解决该问题的一种方法是使用 OpenMP 缩减子句。

I am new to OpenMP, just know some commands and that's all. Now in the code I have written above, the final result comes wrong (n = 9 is the right answer).

较长的答案:

#pragma omp parallel for将创建一个parallel region ,以及threads对于该区域,将使用 default chunk size 分配它所包含的循环的迭代。 ,以及default schedule 通常 static 。但请记住,default schedule OpenMP 的不同具体实现可能会有所不同标准。

来自OpenMP 5.1您可以阅读更正式的描述:

The worksharing-loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed across threads that already exist in the team that is executing the parallel region to which the worksharing-loop region binds.

Moreover ,

The parallel loop construct is a shortcut for specifying a parallel construct containing a loop construct with one or more associated loops and no other statements.

或者非正式地,#pragma omp parallel for是构造函数 #pragma omp parallel 的组合与 #pragma omp for .

因此,您的代码中发生的情况是您有多个线程同时修改 n 的值,要解决这个问题,你应该使用OpenMP归约子句,从OpenMP标准可以读到:

The reduction clause can be used to perform some forms of recurrence calculations (...) in parallel. For parallel and work-sharing constructs, a private copy of each list item is created, one for each implicit task, as if the private clause had been used. (...) The private copy is then initialized as specified above. At the end of the region for which the reduction clause was specified, the original list item is updated by combining its original value with the final value of each of the private copies, using the combiner of the specified reduction-identifier.

有关减少子句如何工作的更详细说明,请查看此 SO Thread .

因此,要解决代码中的竞争条件,只需将其更改为:

 int n = 0;

  #pragma omp parallel for collapse(2) reduction(+:n)
  for (int i = 0; i < 3; i++)
     for(int j = 0; j < 3; j++)
        n++;

关于c++ - 使用 OpenMP 并行运行的最简单示例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65833061/

相关文章:

c - C 中函数参数作为指针和返回值

python - "Can' t 开始一个新的线程错误”在 Python 中

java - 如何使我的代码线程安全?

c++ - VTune 探查器给出错误 : "The Data Cannot be displayed,there is no viewpoint available for data "

c++ - 使用派生类 C++ 方法的父类(super class)

c - "@far int* @near IntegerPointer;"的含义

c - 我不明白我的代码中发生了什么

java - 无法在具有等待和通知的多线程环境中从套接字输入流读取

c++ - 使用 find 检查,如果我在字符串中有下划线

c++ - 指针追逐基准 : Read+Write(+CLFLUSH) faster than Read(+CLFLUSH)