c - 使用cachegrind和callgrind的不同读写计数

标签 c assembly callgrind cachegrind gem5

我正在用 Cachegrind、Callgrind 和 Gem5 做一些实验。我注意到许多访问被计为 cachegrind 的读取,callgrind 的写入以及 gem5 的读取和写入。

让我们举一个非常简单的例子:

int main() {
    int i, l;

    for (i = 0; i < 1000; i++) {
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        ... (100 times)
     }
 }

我编译:

gcc ex.c --static -o ex

所以基本上,根据 asm 文件,addl $1, -8(%rbp) 被执行了 100,000 次。因为它既是读又是写,我期待 100k 读和 100k 写。但是,cachegrind 仅将它们计为读取,而 callgrind 仅计为写入。

 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356== 
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356== 
==15356== I   refs:      111,535
==15356== I1  misses:        475
==15356== LLi misses:        280
==15356== I1  miss rate:    0.42%
==15356== LLi miss rate:    0.25%
==15356== 
==15356== D   refs:      104,894  (103,791 rd   + 1,103 wr)
==15356== D1  misses:        557  (    414 rd   +   143 wr)
==15356== LLd misses:        172  (     89 rd   +    83 wr)
==15356== D1  miss rate:     0.5% (    0.3%     +  12.9%  )
==15356== LLd miss rate:     0.1% (    0.0%     +   7.5%  )
==15356== 
==15356== LL refs:         1,032  (    889 rd   +   143 wr)
==15356== LL misses:         452  (    369 rd   +    83 wr)
==15356== LL miss rate:      0.2% (    0.1%     +   7.5%  )

-

 % valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376== 
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376== 
==15376== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376== 
==15376== I   refs:      111,532
==15376== I1  misses:        474
==15376== LLi misses:        279
==15376== I1  miss rate:    0.42%
==15376== LLi miss rate:    0.25%
==15376== 
==15376== D   refs:      104,894  (2,777 rd + 102,117 wr)
==15376== D1  misses:        557  (  406 rd +     151 wr)
==15376== LLd misses:        172  (   87 rd +      85 wr)
==15376== D1  miss rate:     0.5% ( 14.6%   +     0.1%  )
==15376== LLd miss rate:     0.1% (  3.1%   +     0.0%  )
==15376== 
==15376== LL refs:         1,031  (  880 rd +     151 wr)
==15376== LL misses:         451  (  366 rd +      85 wr)
==15376== LL miss rate:      0.2% (  0.3%   +     0.0%  )

有人能给我一个合理的解释吗?我认为实际上有 ~100k 次读取和 ~100k 次写入(即 addl 的 2 次缓存访问)是否正确?

最佳答案

From cachegrind manual: 5.7.1. Cache Simulation Specifics

  • Instructions that modify a memory location (e.g. inc and dec) are counted as doing just a read, i.e. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting.

    Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.

看来callgrind的缓存模拟逻辑和cachegrind不一样。我认为 callgrind 应该产生与 cachegrind 相同的结果,所以这可能是一个错误?

关于c - 使用cachegrind和callgrind的不同读写计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15790541/

相关文章:

c - 错误 : indirection requires pointer operand

c - 如何将 SHA1 返回值转换为 ascii

c - CMAKE 资源文件中 WINAPI 的对话框资源返回语法错误

顶部带有 %include 的程序集 - 打印输出意外结果 : just an "S"

c - 在 c 中实现列表和队列的最快方法是什么?

performance - 获取第 i 位 - % 还是 & 更快?

assembly - 汇编代码中的 “int 0x2A”是什么意思

c - 使用 callgrind 作为采样分析器?

profiling - Callgrind 配置文件格式包含/自费

Callgrind Anotate 在 OS X 10.10 中不工作