valgrind - 为什么 Cachegrind 不是完全确定性的?

标签 valgrind benchmarking cachegrind

Inspired by SQLite ,我正在考虑使用 valgrind 的“cachegrind”工具来进行可重复的性能基准测试。它输出的数字比我发现的任何其他计时方法都要稳定得多,但它们仍然不是确定性的。作为示例,下面是一个简单的 C 程序:

int main() {
  volatile int x;
  while (x < 1000000) {
    x++;
  }
}

如果我编译它并在cachegrind下运行它,我会得到以下结果:

$ gcc -O2 x.c -o x
$ valgrind --tool=cachegrind ./x
==11949== Cachegrind, a cache and branch-prediction profiler
==11949== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11949== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11949== Command: ./x
==11949==
--11949-- warning: L3 cache found, using its data for the LL simulation.
==11949==
==11949== I   refs:      11,158,333
==11949== I1  misses:         3,565
==11949== LLi misses:         2,611
==11949== I1  miss rate:       0.03%
==11949== LLi miss rate:       0.02%
==11949==
==11949== D   refs:       4,116,700  (3,552,970 rd   + 563,730 wr)
==11949== D1  misses:        21,119  (   19,041 rd   +   2,078 wr)
==11949== LLd misses:         7,487  (    6,148 rd   +   1,339 wr)
==11949== D1  miss rate:        0.5% (      0.5%     +     0.4%  )
==11949== LLd miss rate:        0.2% (      0.2%     +     0.2%  )
==11949==
==11949== LL refs:           24,684  (   22,606 rd   +   2,078 wr)
==11949== LL misses:         10,098  (    8,759 rd   +   1,339 wr)
==11949== LL miss rate:         0.1% (      0.1%     +     0.2%  )
$ valgrind --tool=cachegrind ./x
==11982== Cachegrind, a cache and branch-prediction profiler
==11982== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11982== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11982== Command: ./x
==11982==
--11982-- warning: L3 cache found, using its data for the LL simulation.
==11982==
==11982== I   refs:      11,159,225
==11982== I1  misses:         3,611
==11982== LLi misses:         2,611
==11982== I1  miss rate:       0.03%
==11982== LLi miss rate:       0.02%
==11982==
==11982== D   refs:       4,117,029  (3,553,176 rd   + 563,853 wr)
==11982== D1  misses:        21,174  (   19,090 rd   +   2,084 wr)
==11982== LLd misses:         7,496  (    6,154 rd   +   1,342 wr)
==11982== D1  miss rate:        0.5% (      0.5%     +     0.4%  )
==11982== LLd miss rate:        0.2% (      0.2%     +     0.2%  )
==11982==
==11982== LL refs:           24,785  (   22,701 rd   +   2,084 wr)
==11982== LL misses:         10,107  (    8,765 rd   +   1,342 wr)
==11982== LL miss rate:         0.1% (      0.1%     +     0.2%  )
$

在这种情况下,两次运行之间的“I refs”仅相差 0.008%,但我仍然想知道为什么会有所不同。在更复杂的程序(几十毫秒)中,它们的变化可能更大。有什么办法可以使运行完​​全可重复吗?

最佳答案

a topic in gmane.comp.debugging.valgrind结尾, Nicholas Nethercote(在 Valgrind 开发团队工作的 Mozilla 开发人员)表示,使用 Cachegrind 时出现微小变化很常见(我可以推断它们不会导致重大问题)。

Cachegrind’s manual提到该程序非常敏感。例如,在 Linux 上,地址空间随机化(用于提高安全性)可能是不确定性的根源。

Another thing worth noting is that results are very sensitive. Changing the size of the executable being profiled, or the sizes of any of the shared libraries it uses, or even the length of their file names, can perturb the results. Variations will be small, but don't expect perfectly repeatable results if your program changes at all.

More recent GNU/Linux distributions do address space randomisation, in which identical runs of the same program have their shared libraries loaded at different locations, as a security measure. This also perturbs the results.

While these factors mean you shouldn't trust the results to be super-accurate, they should be close enough to be useful.

关于valgrind - 为什么 Cachegrind 不是完全确定性的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37267427/

相关文章:

c - while 循环中出现段错误。 Valgrind 没有发现错误

自定义分配器 : Valgrind shows 7 allocs, 0 释放,无泄漏

c - 作业 : I have a memory leak somewhere, 但我找不到它。关于如何更有效地使用 valgrind 的任何提示?

c - fprintf valgrind 错误

Cachegrind输出解释

c++ - 缓存未命中的代价是什么

testing - 为集成测试和基准测试共享实用程序函数的惯用方法是什么?

machine-learning - 有哪些机器学习基准?

java - 此计时代码中出现额外错误时间的原因

c++ - 您如何解释缓存未命中的 cachegrind 输出?