memory-management - 何时使用 cudaHostRegister() 和 cudaHostAlloc()？ "Pinned or page-locked"内存是什么意思？哪些是 OpenCL 中的等价物？

我刚接触 Nvidia 的这些 API，有些表达对我来说不太清楚。我想知道是否有人可以帮助我以简单的方式了解何时以及如何使用这些 CUDA 命令。更准确地说:

研究如何通过并行执行内核(例如使用 CUDA)来加速某些应用程序，在某个时候我面临着加速主机-设备交互的问题。
我有一些信息，在网上冲浪，但我有点困惑。
很明显，当可以使用 cudaHostRegister() 时，您可以走得更快。和/或 cudaHostAlloc() . Here解释说

"you can use the cudaHostRegister() command to take some data (already allocated) and pin it avoiding extra copy to take into the GPU".

“固定内存”是什么意思？为什么这么快？我以前在这个领域如何做到这一点？之后，在链接中的同一个视频中，他们继续解释说

"if you are transferring PINNED memory, you can use the asynchronous memory transfer, cudaMemcpyAsync(), which let's the CPU keep working during the memory transfer".

PCIe 事务是否完全由 CPU 管理？有没有负责这件事的巴士经理？
也非常感谢在最后重新组合拼图的部分答案。

也很高兴有一些关于 OpenCL 中等效 API 的链接。

最佳答案

What is the meaning of "pin the memory"?

意思是做内存page locked .这告诉操作系统虚拟内存管理器内存页面必须保留在物理内存中，以便 GPU 可以通过 PCI-express 总线直接访问它们。

Why is it so fast?

一句话，DMA .当内存被页面锁定时，GPU DMA 引擎可以直接运行传输而不需要主机 CPU，从而减少整体延迟并减少网络传输时间。

Are the PCIe transaction managed entirely from the CPU?

不，见上文。

Is there a manager of a bus that takes care of this?

不。GPU 管理传输。在这种情况下，没有总线主机这样的东西

关于memory-management - 何时使用 cudaHostRegister() 和 cudaHostAlloc()？ "Pinned or page-locked"内存是什么意思？哪些是 OpenCL 中的等价物？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39454465/

memory-management - 何时使用 cudaHostRegister() 和 cudaHostAlloc()？ "Pinned or page-locked"内存是什么意思？哪些是 OpenCL 中的等价物？

上一篇：twitter-bootstrap - Twitter Bootstrap - 导航栏内但导航栏外的下拉菜单折叠

下一篇：cruisecontrol.net - 在 NAnt 并行任务中获取 CCNetBuildDate