c++ - 基于并行位操作的 openMP 素数查找器？

最近在学习C++。素数查找器是一个练习。我写完串行版代码后，尝试用openmp并行化代码。然而，串行版本工作正常但是一旦我尝试了 openmp 版本，输出(1 到 100 之间的质数)是非常错误的。有什么建议吗？

这是我的代码(相关函数):

    void prime2_bits(const int& n)
    {
       omp_set_num_threads(4);
       int i,j,tp,sq=sqrt(n+1);
       int p[n/32+1];
       initint(p,n/32+1);
    #pragma omp parallel for default(shared) private(i,j,tp) schedule(dynamic)
       for(i=2;i<sq;i++){
         if(!((p[i/32]>>(i%32))&1)){ // see whether the ith bit is 0
           tp=i*i;
           for(j=tp;j<n+1;j+=i){
             p[j/32]|=(1<<(j%32)); // set the jth bit as 1
           }
         }
       }
    // Do the printing job
       for(i=2;i<n+1;i++)
         if(!((p[i/32]>>(i%32))&1)) cout<<i<<' ';
    }

输出如下:

using method 2 (bits to save space):
2 3 5 7 11 13 15 17 19 21 23 27 29 31 35 37 41 43 47 49 53 55 59 61 67 69 71 73 77 79 81 83 87 89 91 93 97 99 0 msec used!

另外，为了提高效率，我又修改了串口代码。然后尝试将它与 openmp 并行，但没有成功。有什么建议吗？

修改后的代码:

void prime3_norev(const int& n)
{
   int i,j,tp,m=n+1;
   int pi=0;
   bool pFlag[m];
   int p[n/3];
   omp_set_num_threads(4);
   initbool(pFlag,m);
   initint(p,n/3);
#pragma omp parallel for default(shared) private(i,j) schedule(dynamic) 
   for(i=2;i<m;i++){
     if(!pFlag[i]) p[pi++]=i;
     for(j=0;(j<pi)&&(i*p[j]<m);j++){
       pFlag[i*p[j]]=true;
       if(i%p[j]==0) break;
     }
   }
// Do the printing job
   for(i=0;i<pi;i++)
     cout<<p[i]<<' ';
}

和相应的输出:

using method 3 (no revisits, space for time):
2 3 6 7 11 13 15 19 23 29 31 35 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97 0 msec used!

代码编译:

g++ -O2 -fopenmp fib.cc

非常感谢!

最佳答案

要修复您的代码，您可以使用 ordered 子句

#pragma omp parallel for schedule(dynamic) ordered
for(int i=2;i<m;i++) {
    #pragma omp ordered
    if(!pFlag[i]) p[pi++]=i;
    //rest of code

有关工作示例，请参阅 http://coliru.stacked-crooked.com/a/fa1eebf126940fb0 .我在那里使用 schedule(static) 只是为了在没有 ordered 子句的情况下更容易地显示错误。

不过，我不确定订购会对性能产生什么影响。在我看来，您正在尝试并行化 Eratosthenes 筛法。在那种情况下，我建议您查看 http://create.stephan-brumme.com/eratosthenes/ .它使用 OpenMP 在大约 1 秒内找到最多 10^9 的所有素数。

关于c++ - 基于并行位操作的 openMP 素数查找器？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22855789/

c++ - 基于并行位操作的 openMP 素数查找器？

上一篇：c++ - 从派生类创建复制构造函数和赋值运算符

下一篇：c++ - 如何将 ifstream 或字符串输入转换为 char？