c - openACC 传递结构列表

我有一个 C 程序来查找两组多边形是否重叠。用户输入 2 组多边形(每组数据有数千个多边形)，程序查看 set1 中的哪个多边形与 set2 中的哪个多边形重叠

我有两个这样的结构:

struct gpc_vertex  /* Polygon vertex */
{
    double          x;
    double          y;
};

struct gpc_vertex_list  /* Polygon contour */
{
    int pid;    // polygon id
    int             num_vertices;
    double *mbr;   // minimum bounding rectangle of the polygon, so always 4 elements

};

我有以下代码段:

#pragma acc kernels copy(listOfPolygons1[0:polygonCount1], listOfPolygons2[0:polygonCount2], listOfBoolean[0:dump])
for (i=0; i<polygonCount1; i++){
    polygon1 = listOfPolygons1[i];

    for (j=0; j<polygonCount2; j++){

        polygon2 = listOfPolygons2[j];
        idx = polygonCount2 * i + j;

        listOfBoolean[idx] = isRectOverlap(polygon1.mbr, polygon2.mbr);  // line 115

    }
}

listOfPolygons1 和 listOfPolygons2(顾名思义)是 gpc_vertex_list 的数组。 listOfBoolean 是一个 int 数组。
检查 2 个多边形的 mbr 以查看它们是否重叠，函数“isRectOverlap”如果重叠则返回 1，否则返回 0 并将值放入 listOfBoolean

问题
代码可以编译但不能运行。它返回以下错误:

call to cuEventSynchronize returned error 700: Illegal address during kernel execution

我的观察
把115行改成这样，程序就可以编译运行了:

isRectOverlap(polygon1.mbr, polygon2.mbr); // without assigning value to listOfBoolean

or this:

listOfBoolean[idx] = 5; // assigning an arbitrary value

(though the result is wrong, but at least, it can run)

Question
Both "isRectOverlap" and "listOfBoolean" do not seem to produce the problem if value is not passed from "isRectOverlap" to "listOfBoolean"
Does anyone know why it can't run if I assign the return value from "isRectOverlap" to "listOfBoolean"?

isRectOverlap function is like this:

int isRectOverlap(double *shape1, double *shape2){

    if (shape1[0] > shape2[2] || shape2[0] > shape1[2]){
        return 0;
    }

    if (shape1[1] < shape2[3] || shape2[1] < shape1[3]){
        return 0;
    }

    return 1;

}

程序不在OpenACC中运行没有问题

感谢帮助

最佳答案

当在 OpenACC 数据子句中使用聚合数据类型时，将执行该类型的浅拷贝。这里最有可能发生的是，当 listOfPolygons 数组被复制到设备时，“mbr”将包含主机地址。因此，程序在访问“mbr”时会报非法地址错误。

鉴于评论说“mbr”将始终为 4，最简单的做法是将“mbr”设为大小为 4 的固定大小数组。

假设您在 NVIDIA 设备上使用 PGI 编译器，第二种方法是通过编译“-ta=tesla:managed”来使用 CUDA 统一内存。所有动态内存都将由 CUDA 运行时处理，并允许在设备上访问主机地址。需要注意的是它只适用于动态数据，你的整个程序只能使用设备上可用的内存，这可能会减慢你的程序。 http://www.pgroup.com/lit/articles/insider/v6n2a4.htm

第三种选择是对设备执行聚合类型的深层复制。如果您决定走这条路，我可以发布一个示例。我还在 GTC2015 上做的演讲中谈到了这个主题:https://www.youtube.com/watch?v=rWLmZt_u5u4

关于c - openACC 传递结构列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38779782/

c - openACC 传递结构列表

上一篇：clang 问题 : inline function inside another function body in C

下一篇：c - 刷新管道的缓冲区