c - 在 CUDA 中主机和设备之间传递变量

我有以下 CUDA 内核，它执行广度优先搜索。

__global__ void bfs(const Edge* edges, int* vertices, int* current_depth, bool* done){

    int e = blockDim.x * blockIdx.x + threadIdx.x;
    int vfirst = edges[e].first;
    int dfirst = vertices[vfirst];
    int vsecond = edges[e].second;
    int dsecond = vertices[vsecond];

    if((dfirst == *current_depth) && (dsecond == -1)){
        vertices[vsecond] = dfirst +1;
        *current_depth = dfirst+1;
        *done = false;
    }
    if((dsecond == *current_depth) && (dfirst == -1)){
        vertices[vfirst] = dsecond + 1;
        *current_depth = dsecond +1;
        *done = false;
    }
}

该内核采用在主机上分配的值，然后在设备上修改并写回到主机中。

所以我声明了这两个变量并以这种方式将它们复制到设备

bool h_done = true;
    bool* d_done;
    int* d_current_depth;
    int h_current_depth = 0;

    cudaMalloc((void**)&d_done, sizeof(bool));
    cudaMalloc((void**)&d_current_depth, sizeof(int));
    cudaMemcpy(d_done, &h_done, sizeof(bool), cudaMemcpyHostToDevice);
    cudaMemcpy(d_current_depth, &h_current_depth, sizeof(int), cudaMemcpyHostDevice);

并在此处循环启动内核。

bfs<<<blocksPerGrid, threadsPerBlock>>>(h_edges, h_vertices, d_current_depth, d_done);

代码编译并运行良好，但主机值永远不会在设备上修改，反之亦然。我已经详细阅读了 NVIDIA 示例代码，但似乎无法正确理解。我是 CUDA 新手。任何帮助表示赞赏。

最佳答案

这个:

bfs<<<blocksPerGrid, threadsPerBlock>>>(h_edges, h_vertices, d_current_depth, d_done);

几乎肯定是错误的。

除非您使用托管内存(我对此表示怀疑)，否则 h_edges 和 h_vertices 是(按其名称)主机内存中的变量。您无法在设备代码中传递和修改常规主机指针。由于此错误，您的内核可能无法运行。

您的代码报告的未指定启动错误很可能是由此引起的。

关于c - 在 CUDA 中主机和设备之间传递变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34112185/

c - 在 CUDA 中主机和设备之间传递变量

上一篇：c++ - 理解这个递归函数

下一篇：c - 写了一段代码，但是有一些错误