caching - 全局内存缓存重放率如何可能超过 100%?

标签 caching cuda gpu gpgpu

我有一个正在进行基准测试的 CUDA 内核,全局内存缓存重放显示为 216.9%

这对我来说不太有意义。我可以看到缓存未命中发生率超过 100% 的唯一方法是在多个缓存级别上都丢失,但这里似乎不应该是这种情况。

您知道为什么会出现这种情况吗?

最佳答案

我也遇到过类似的问题。我的全局负载效率超过 100%。 Here是它的链接。由于我认为这两种现象有相同的根源,我引用我得到的答案:

Global Load Efficiency and Global Store Efficiency describe how well the coalescing of DRAM-accesses and (L2?) Cache-accesses works. If they're 100 percent then you've got perfect coalescing. Since efficiencies above 100 percent don't make any sense (you cannot be better than optimal) this has to be an error. This error is caused by the Visual Profiler, which counts hardware events to calculate some abstract metrics. But the GPU doesn't have the "correct" events to exactly calculate all those metrics, thus Visual Profiler has to estimate those metrics by using some complex formula and "wrong" events. There are some metrics which are just rough estimations and Global Load Efficiency and Global Store Efficiency are two of them. Thus if such an efficiency is bigger than 100 percent it is an estimation error. As far as I observed the Global Load Efficiency and Global Store Efficiency both increased above 100 percent in some of my register spilling kernels. That's why i assume that the Visual-Profiler uses some events, which also may be caused by local memory accesses, to calculate those two efficiencies. Furthermore GPUs just uses 32 Bit Counters. Thus long running kernel tend to overflow those counters, which also causes the Visual Profiler to display wrong metrics.

关于caching - 全局内存缓存重放率如何可能超过 100%?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21079989/

相关文章:

c# - EF6 使用命令树拦截器禁用查询计划缓存

cuda - 如何使用 CUDA Thrust 用最后一个非缺失值填充数组中的缺失值?

ios - GPUImage3 无法将视频导出到文档目录

cuda - 在 GPU 上查找 FFT 的最快库是什么?

opencl - 如何计算 GPGPU 硬件中的峰值 FLOPS?

caching - 网页缓存设置过期

performance - 如何比较没有缓存的 neo4j 查询的性能?

php - PHP 中的缓存破坏器

Cuda block /网格尺寸: when to use dim3?

c++ - CUDA - 检查重复值并添加两个值