c - CUDA 上的 BFS 实现问题

已关闭。此问题需要 debugging details 。目前不接受答案。

编辑问题以包含 desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem 。这将有助于其他人回答问题。

已关闭 7 年前。

Improve this question

我有以下使用 CUDA 7.5 运行并行 BFS 的算法。该函数接受一个边数组和一个顶点数组。一条边定义为

typedef struct Edge
{
    int first;
    int second;

}Edge;

除了起始顶点 0 之外，所有顶点的顶点数组都初始化为 -1。因此我们有类似的内容

 1 2 3 4 5 6.....1024
-1-1-1-1 0-1.....-1

在本例中起始顶点是 4 假设一个稀疏图，边缘列表将以以下方式包含数据(数据将在边缘内容中，但我提供了整数表示)

1 2  3  4 5 .......2048
3 17 12 1 3........2010
9 34 20 9 17.......196

BFS 应该与 2048 个线程并行运行，任何以 0 作为第一/第二索引的线程写入顶点数组 1 中的相关索引，并将 bool 修改为 1。这是 BFS 代码。

__global__ void bfs(Edge* edges, int* vertices, int current_depth, int* modified){

    int e = blockDim.x * blockIdx.x + threadIdx.x;
    int vfirst = edges[e].first;
    if (vfirst > 1023) {printf("oops %d:%d\n", e, vfirst); return;}
    int dfirst = vertices[vfirst];
    int vsecond = edges[e].second;
    if (vsecond > 1023) {printf("oops %d:%d\n", e, vsecond); return;}
    int dsecond = vertices[vsecond];

    if((dfirst == current_depth) && (dsecond == -1)){
        vertices[vsecond] = current_depth;
        printf("e:%d  depth:%d\n", e, current_depth);
        __syncthreads();
        *modified = 1;
        printf("%d\n", *modified);
    }else if((dsecond == current_depth) && (dfirst == -1)){
        vertices[vfirst] = current_depth;
        printf("e:%d depth:%d\n", e, current_depth);
        __syncthreads();
        *modified = 1;
        printf("%d\n", *modified);
    }
}

主代码重复调用此 BFS 内核，每次调用时都会增加当前深度的值。这是调用代码的相关部分。

begin = clock();

    do{

        h_modified = 0;
        //printf("Entered while loop\n");
        err = cudaMemcpy(d_modified, &h_modified, sizeof(int), cudaMemcpyHostToDevice);
        if (err != cudaSuccess)
        {
            fprintf(stderr, "Failed to copy h_done to device(error code %s)!\n", cudaGetErrorString(err));
            exit(EXIT_FAILURE);
        }

        printf("CUDA kernel launching with %d blocks of %d threads\n", edgeBlocks, threadsPerBlock);

        bfs<<<edgeBlocks, threadsPerBlock>>>(d_edges, d_vertices, current_depth, d_modified);
        cudaThreadSynchronize();

        err = cudaGetLastError();
        if (err != cudaSuccess)
        {
            fprintf(stderr, "Failed to launch bfs kernel (error code %s)!\n", cudaGetErrorString(err));
            exit(EXIT_FAILURE);
        }
        //printf("Second kernel launch finished\n");

        err = cudaMemcpy(&h_modified, d_modified, sizeof(int), cudaMemcpyDeviceToHost);
        printf("%d\n", h_modified);
        if (err != cudaSuccess)
        {
            fprintf(stderr, "Failed to copy d_done to host(error code %s)!\n", cudaGetErrorString(err));
            exit(EXIT_FAILURE);
        }

        printf("BFS run for level %d\n", current_depth);
        current_depth++;


    }while(h_modified != 0);

    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Time taken: %f\n", time_spent);

memcpy 和 malloc 均已正确并经过检查。我的问题是，在第一次运行 BFS 时，修改后的变量永远不会被修改为 1，因此该函数永远不会被第二次调用。我已经彻底检查了我的逻辑，但似乎无法指出问题所在。这是我在 CUDA 中的第一个完整项目，因此我们将不胜感激。对于那些想要完整可验证示例的人，请使用链接，

https://github.com/soumasish/ParallelBreadthFirstSearch

最佳答案

My problem is that in the first run of BFS the modified variable never gets modified to 1 and thus the function never gets called a second time.

遇到的第一个问题是越界访问，如 cuda-memcheck 所示。这是由于 edges 数组错误地包含了 1024 的索引，这是无效的，因为 vertices 数组的大小是 1024。修复方法是修改edges 数组，以便不存在大于 1023 的值。
发现的另一个主要问题是 current_depth 变量最初被设置为 1(在 main.cu 中)。这会阻止满足 bfs 内核中的任一 if 条件。 if 条件:
```
    if((dfirst == current_depth) &&...
```
取决于从与 current_depth 值匹配的 vertices 数组中检索的值 (dfirst)。但由于 vertices 数组的初始数量全部为 0 或 -1(如问题描述中所述)，因此不可能满足第一个 if 条件内核启动。因此，modified 变量永远不会更改为 1，因此不会发生额外的内核启动。修复方法是在 main.cu 中将 current_depth 初始设置为零。
此外，在 main.cu 中观察到以下代码行:
```
err = cudaMemcpy(&h_vertices, d_vertices, VERTEX_BYTES, cudaMemcpyDeviceToHost);
```
由于h_vertices已经是一个指针，所以在这里取它的地址是不正确的，所以&符号在这里不合适。按原样使用该代码将导致程序中堆栈损坏。修复方法是删除 & 符号。

关于c - CUDA 上的 BFS 实现问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34319203/

c - CUDA 上的 BFS 实现问题

上一篇：c - mingw 获取光标位置(控制台)

下一篇：c - 数组映射选定的内存区域