multithreading - Boost::thread() 和 Nvidia CUDA 是否存在某种不兼容？

我正在开发一个通用的流 CUDA 内核执行框架，它允许在 GPU 上进行并行数据复制和执行。

目前我在 C++ 静态函数包装器中调用 cuda 内核，所以我可以从 .cpp 文件(不是 .cu)调用内核，如下所示:

//kernels.cu:

//kernel definition
__global__ void kernelCall_kernel(  dataRow* in,  dataRow* out,  void* additionalData){
    //Do something
};

//kernel handler, so I can compile this .cu and link it with the main project and call it within a .cpp file
extern "C" void kernelCall( dataRow* in,  dataRow* out,  void* additionalData){ 
    int blocksize = 256;  
    dim3 dimBlock(blocksize);
    dim3 dimGrid(ceil(tableSize/(float)blocksize)); 
    kernelCall_kernel<<<dimGrid,dimBlock>>>(in, out, additionalData);   

}

如果我将处理程序作为普通函数调用，则打印的数据是正确的。

//streamProcessing.cpp
//allocations and definitions of data omitted

//copy data to GPU
cudaMemcpy(data_d,data_h,tableSize,cudaMemcpyHostToDevice);
//call:
kernelCall(data_d,result_d,null);
//copy data back
cudaMemcpy(result_h,result_d,resultSize,cudaMemcpyDeviceToHost);
//show result:
printTable(result_h,resultSize);// this just iterate and shows the data

但是为了允许在 GPU 上并行复制和执行数据，我需要创建一个线程，所以当我调用它时会创建一个新的 boost::thread:

//allocations, definitions of data,copy data to GPU omitted
//call:
boost::thread* kernelThreadOwner = new boost::thread(kernelCall, data_d,result_d,null); 
kernelThreadOwner->join();
//Copy data back and print ommited

最后打印结果时我只是得到垃圾。

目前我只是使用一个线程，用于测试目的，所以直接调用它或创建一个线程应该没有太大区别。我不知道为什么直接调用函数会给出正确的结果，而在创建线程时却没有。这是 CUDA 和 boost 的问题吗？我错过了什么吗？谢谢你的建议。

最佳答案

问题在于(CUDA 4.0 之前)CUDA 上下文与创建它们的线程相关联。当您使用两个线程时，您有两个上下文。主线程分配和读取的上下文，和运行内核的线程在里面的上下文是不一样的。内存分配在上下文之间不可移植。它们实际上是同一 GPU 内的独立内存空间。

如果你想以这种方式使用线程，你要么需要重构一些东西，以便一个线程只与 GPU“对话”，并通过 CPU 内存与父级通信，要么使用 CUDA 上下文迁移 API，它允许上下文从一个线程移动到另一个线程(通过 cuCtxPushCurrent 和 cuCtxPopCurrent)。请注意，上下文迁移不是免费的，并且存在延迟，因此如果您计划频繁迁移上下文，您可能会发现更改为保留上下文线程关联的不同设计更有效。

关于multithreading - Boost::thread() 和 Nvidia CUDA 是否存在某种不兼容？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6101085/

multithreading - Boost::thread() 和 Nvidia CUDA 是否存在某种不兼容？

上一篇：multithreading - performSelectorOnMainThread 没有等待完成

下一篇：multithreading - 如何在Perl中唤醒线程