c++ - 每次执行时,OpenACC 总和减少输出递增总和

#include <iostream>

int main(int argc, char const *argv[])
    int sum = 0;
    int *array;
    array = new int [100];

    #pragma acc enter data create(array[0:100],sum)

    #pragma acc parallel loop present(array[0:100])
    for (int i = 0; i < 100; ++i)
        array[i] = 1;

    #pragma acc parallel loop present(array[0:100],sum) reduction(+:sum)
    for (int i = 0; i < 100; ++i)
        sum += array[i];

    #pragma acc exit data delete(array[0:100]) copyout(sum)

    std::cout << sum << std::endl;

    return 0;


$ pgcpp -acc -Minfo main.cpp
      7, Generating enter data create(sum)
         Generating enter data create(array[:100])
         Generating present(array[:100])
         Accelerator kernel generated
         12, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
      7, Generating Tesla code
     15, Generating present(array[:100])
         Generating present(sum)
         Accelerator kernel generated
         18, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
         20, Sum reduction generated for sum
     15, Generating Tesla code
     25, Generating exit data copyout(sum)
         Generating exit data delete(array[:100])
$ ./a.out
$ ./a.out
$ ./a.out
$ ./a.out

根据 OpenACC 标准:

On an exit data directive, the data is copied back to the local memory and deallocated.

sum 似乎没有被释放,而是在程序的每次运行中被重新使用(并递增)。此外,reduction 指令中的 + 运算符将 reduction 变量初始化为 0,因此即使 sum 未在执行之间释放。

我可以通过在 enter data 指令中为 sum 使用 copyin 而不是 create 来避免这种行为, 或者在单个 gang, single worker 内核中设置 sum = 0:

#pragma acc parallel present(sum) num_gangs(1) num_workers(1)
sum = 0;



您误解了归约运算符初始化值的含义。引用openACC specification , 第 20-21 页:

The reduction clause is allowed on the parallel construct. It specifies a reduction operator and one or more scalar variables. For each variable, a private copy is created for each parallel gang and initialized for that operator. At the end of the region, the values for each gang are combined using the reduction operator, and the result combined with the value of the original variable and stored in the original variable.

这意味着整体缩减问题被分解成多个部分,每个部分由一个团队处理。该帮派处理的问题部分将使用指示的归约变量的初始化值。但是,当创建最终结果时,每个组的单独结果将与原始变量的值(在您的情况下为 sum)组合,这就是结果。

因此您必须正确初始化 sum,也许使用您在问题中概述的方法之一。


