cuda - 为什么 CUDA Profiler 指示重放指令 : 82% ! = 全局重放 + 本地重放 + 共享重放？

我从 CUDA Profiler 获得了信息。我很困惑为什么
重放指令!=全局内存重放+本地内存重放+共享库冲突重放？

请参阅我从分析器中获得的以下信息:

Replayed Instructions(%): 81.60
Global memory replay(%): 21.80
Local memory replays(%): 0.00
Shared bank conflict replay(%): 0.00

你能帮我解释一下吗？是否有其他情况导致指令重放？

最佳答案

因为 SM 可以由于其他因素重放指令，例如不同的分支逻辑。

所以我可以假设 60% 的代码由于分支而重新发布，20% 是由于全局内存。你可以发布一个片段吗？

从 Cuda 4.0 分析器的 F1 帮助菜单:

Replayed Instructions (%) This gives the percentage of instructions replayed during kernel execution. Replayed instructions are the difference between the numbers of instructions that are actually issued by the hardware to the number of instructions that are to be executed by the kernel. Ideally this should be zero. This is calculated as 100 * (instructions issued - instruction executed) / instruction issued

Global memory replay (%) Percentage of replayed instructions caused due to global memory accesses. This is calculated as 100 * (l1 global load miss) / instructions issued

Local memory replay (%) Percentage of replayed instructions caused due to local memory accesses. This is calculated as 100 * (l1 local load miss + l1 local store miss) / instructions issued

Shared bank conflict replay (%) Percentage of replayed instructions caused due to shared memory bank conflicts. This is calculated as 100 * (l1 shared conflict)/ instructions issued

关于cuda - 为什么 CUDA Profiler 指示重放指令 : 82% ! = 全局重放 + 本地重放 + 共享重放？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7187489/

cuda - 为什么 CUDA Profiler 指示重放指令 : 82% ! = 全局重放 + 本地重放 + 共享重放？

上一篇：windows-phone-7 - 在 Windows Phone 7 上有没有办法获取 GPS 信号的当前强度？

下一篇：layout - ExtJs:为具有许多字段的复杂表单选择哪种布局