cuda - 我可以使用推力::host_vector 还是我必须使用 cudaHostAlloc 来使用推力进行零复制？

我想通过 cudaHostGetDevicePointer 在映射内存上使用零拷贝.我可以用thrust::host_vector或者我必须使用 cudaHostAlloc(...,cudaHostAllocMapped)?或者使用 Thrust 是否更容易？

最佳答案

我很确定仍然无法使用推力::host_vector 作为映射的主机分配。有一个固定内存分配器，但我不相信映射内存可用。你需要做的是这样的:

使用 cudaHostAlloc 分配映射的、固定的主机内存

使用 cudaHostGetDevicePointer 获取零拷贝内存的设备指针

创建一个 thrust::device_ptr使用 thrust::device_pointer_cast在该设备指针上(有关更多信息，请参阅 here)

你可以做一个 thrust::device_vector使用 thrust::device_ptr或直接通过 thrust::device_ptr任何接受迭代器的算法。

关于cuda - 我可以使用推力::host_vector 还是我必须使用 cudaHostAlloc 来使用推力进行零复制？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11692326/

相关文章：

c++ - PyCuda 使用 Streams 执行 Thrust