multithreading - cudaDeviceSynchronize() 仅在当前 CUDA 上下文或所有上下文中等待完成？

我用 CUDA 6.5 和 4 x GPU Kepler .

我使用多线程、CUDA 运行时 API 并从不同的 CPU 线程访问 CUDA 上下文(通过使用 OpenMP - 但它并不重要)。

当我打电话时cudaDeviceSynchronize();它是否会等待内核仅在由最新调用选择的当前 CUDA 上下文中完成 cudaSetDevice() ，还是在所有 CUDA 上下文中？

如果它将等待内核在所有 CUDA 上下文中完成，那么它将在当前 CPU 线程中使用的所有 CUDA 上下文中等待(例如 CPU thread_0 将等待 GPU:0 和 1)或通常所有 CUDA 上下文(CPU thread_0 将等待 GPU:0、1、2 和 3)？

以下代码:

// For using OpenMP requires to set:
// MSVS option: -Xcompiler "/openmp"
// GCC option: –Xcompiler –fopenmp
#include <omp.h>

int main() {

    // execute two threads with different: omp_get_thread_num() = 0 and 1
    #pragma omp parallel num_threads(2)
    {
        int omp_threadId = omp_get_thread_num();

        // CPU thread 0
        if(omp_threadId == 0) {

            cudaSetDevice(0);
            kernel_0<<<...>>>(...);
            cudaSetDevice(1);
            kernel_1<<<...>>>(...);

            cudaDeviceSynchronize(); // what kernel<>() will wait?

        // CPU thread 1
        } else if(omp_threadId == 1) {

            cudaSetDevice(2);
            kernel_2<<<...>>>(...);
            cudaSetDevice(3);
            kernel_3<<<...>>>(...);

            cudaDeviceSynchronize(); // what kernel<>() will wait?

        }
    }

    return 0;
}

最佳答案

When I call cudaDeviceSynchronize(); will it wait for kernel(s) to finish only in current CUDA context which selected by the latest call cudaSetDevice(), or in all CUDA contexts?

cudaDeviceSynchronize()同步 中的所有流仅限当前 CUDA 上下文 .

注:cudaDeviceSynchronize()只会同步主机与当前设置的GPU，如果多个GPU都在使用并且都需要同步，cudaDeviceSynchronize()必须为每个单独调用。

这是一个最小的例子:

cudaSetDevice(0); cudaDeviceSynchronize();
cudaSetDevice(1); cudaDeviceSynchronize();
...

来源 :Pawel Pomorski，“多 GPU 上的 CUDA”幻灯片。已链接 here .

关于multithreading - cudaDeviceSynchronize() 仅在当前 CUDA 上下文或所有上下文中等待完成？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26841476/

multithreading - cudaDeviceSynchronize() 仅在当前 CUDA 上下文或所有上下文中等待完成？

上一篇：visual-studio - winRT 异常堆栈中的行号

下一篇：java - 使用 JpaRepository 缓存