cuda - 为什么 CUDA Profiler 指示重放指令 : 82% ! = 全局重放 + 本地重放 + 共享重放?

标签 cuda gpu gpgpu

我从 CUDA Profiler 获得了信息。我很困惑为什么
重放指令!=全局内存重放+本地内存重放+共享库冲突重放?

请参阅我从分析器中获得的以下信息:

Replayed Instructions(%): 81.60
Global memory replay(%): 21.80
Local memory replays(%): 0.00
Shared bank conflict replay(%): 0.00

你能帮我解释一下吗?是否有其他情况导致指令重放?

最佳答案

因为 SM 可以由于其他因素重放指令,例如不同的分支逻辑。

所以我可以假设 60% 的代码由于分支而重新发布,20% 是由于全局内存。你可以发布一个片段吗?

从 Cuda 4.0 分析器的 F1 帮助菜单:

Replayed Instructions (%) This gives the percentage of instructions replayed during kernel execution. Replayed instructions are the difference between the numbers of instructions that are actually issued by the hardware to the number of instructions that are to be executed by the kernel. Ideally this should be zero. This is calculated as 100 * (instructions issued - instruction executed) / instruction issued

Global memory replay (%) Percentage of replayed instructions caused due to global memory accesses. This is calculated as 100 * (l1 global load miss) / instructions issued

Local memory replay (%) Percentage of replayed instructions caused due to local memory accesses. This is calculated as 100 * (l1 local load miss + l1 local store miss) / instructions issued

Shared bank conflict replay (%) Percentage of replayed instructions caused due to shared memory bank conflicts. This is calculated as 100 * (l1 shared conflict)/ instructions issued

关于cuda - 为什么 CUDA Profiler 指示重放指令 : 82% ! = 全局重放 + 本地重放 + 共享重放?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7187489/

相关文章:

c++ - 如何向编译器指示指针参数已对齐?

parallel-processing - 简单的字符分配在 CUDA 中不起作用

c++ - 如何将不同种类的纹理绑定(bind)到 CUDA 中的纹理引用?

vb.net - vb.net 中的 GPU 处理

cuda - 使用 Thrust 的向量数组

c - 如何在 OpenCL 中将元素添加到数组末尾?

c++ - CUDA - 将 RGB 图像转换为灰度

performance - 已编入索引的 GL_TRIANGLES 与编入索引的 GL_TRIANGLE_STRIP

Matlab:将相同的 GPU 函数并行应用于多个向量

random - 在GPU上高效获取范围内的随机数