c++ - CUDA 固定内存和合并

标签 c++ memory cuda coalescing

在计算能力 2.x 设备上,我如何确保 gpu 在使用映射固定内存时使用合并内存访问,并假设通常在使用全局内存时二维数据需要填充?

我似乎无法在任何地方找到有关此的信息,也许我应该看起来更好,或者我可能遗漏了什么。欢迎任何指向正确方向的指示...

最佳答案

合并方法应该在使用零拷贝内存时应用。引用 CUDA C 最佳实践指南:

Because the data is not cached on the GPU, mapped pinned memory should be read or written only once, and the global loads and stores that read and write the memory should be coalesced.

引用《CUDA Programming》一书,S. Cook 着

If you think about what happens with access to global memory, an entire cache line is brought in from memory on compute 2.x hardware. Even on compute 1.x hardware the same 128 bytes, potentially reduced to 64 or 32, is fetched from global memory. NVIDIA does not publish the size of the PCI-E transfers it uses, or details on how zero copy is actually implemented. However, the coalescing approach used for global memory could be used with PCI-E transfer. The warp memory latency hiding model can equally be applied to PCI-E transfers, providing there is enough arithmetic density to hide the latency of the PCI-E transfers.

关于c++ - CUDA 固定内存和合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19101354/

相关文章:

c++ - CUDA 异常行为访问 vector

c++ - 从字符串转换为 64 位整数

C++指针转换问题

Ruby/Mechanize 有什么方法可以耗尽 RAM? -> 分配内存失败

java - 如何确定哪个方法产生最多垃圾

gcc - nvcc for linux 使用的默认主机编译器

C++ Windows 线程池(非增强/c++11)

c++ - C++11 的 decltype 是否不需要克隆?

linux - 我该如何修复这个解析 linux smap 的 perl 脚本?

cuda - NVIDIA GPU的任务调度