c - 为什么第二次处理一个数组比较慢?

标签 c arrays performance

这个简单的 C 代码首先创建一个 0xFFFFFF 元素的数组,然后传递它两次,测量每次传递所花费的时间:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define TESTSIZ 0xffffff

char testcases[TESTSIZ];

void gentestcases(void)
{
        size_t i = 0;
        while(i < TESTSIZ)
                testcases[i++] = rand()%128;

        return;
}

long long time_elapsed(struct timespec beg, struct timespec end)
{
        if(end.tv_nsec < beg.tv_nsec) {
                end.tv_nsec += 1000000000;
                end.tv_sec--;
        }

        return 1000000000ll*(end.tv_sec-beg.tv_sec) + end.tv_nsec-beg.tv_nsec;
}

long long test( int(*func)(int) )
{
        struct timespec beg, end;

        clock_gettime(CLOCK_MONOTONIC, &beg);

        int volatile sink;
        size_t i = 0;
        while(i < TESTSIZ)
                sink = islower(testcases[i++]);

        clock_gettime(CLOCK_MONOTONIC, &end);

        return time_elapsed(beg, end);
}

int main()
{
        gentestcases();

        struct timespec beg, end;

        printf("1st pass took %lld nsecs\n", test(islower));
        printf("2nd pass took %lld nsecs\n", test(islower));
}

我用 gcc -O2 -std=gnu89 -o sb sillybench.c 编译它

通常我得到的结果是第二次处理数组比较慢。效果很小但很明显(1-3 毫秒)并且 - 除了一个异常(exception) - 重复:

m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13098789 nsecs
2nd pass took 13114677 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13052105 nsecs
2nd pass took 13134187 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13118069 nsecs
2nd pass took 13074199 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13038579 nsecs
2nd pass took 13079995 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13070334 nsecs
2nd pass took 13324378 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13031000 nsecs
2nd pass took 13167349 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13019961 nsecs
2nd pass took 13310211 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13041332 nsecs
2nd pass took 13311737 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13030913 nsecs
2nd pass took 13177423 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13060570 nsecs
2nd pass took 13387024 nsecs

为什么会这样?如果有的话,我认为第一次处理数组应该更慢,而不是第二次!

如果这很重要:

m@m-X555LJ ~/UVA/fastIO $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

System:    Host: m-X555LJ Kernel: 4.4.0-21-generic x86_64 (64 bit gcc: 5.3.1)
           Desktop: Cinnamon 3.0.7 (Gtk 2.24.30) Distro: Linux Mint 18 Sarah

CPU:       Dual core Intel Core i5-5200U (-HT-MCP-) cache: 3072 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 8786
           clock speeds: max: 2700 MHz 1: 2200 MHz 2: 2202 MHz 3: 2200 MHz
           4: 2200 MHz

最佳答案

此效果很可能是由turbo 模式(或 Intel Turbo Boost 技术)引起的。 Turbo 模式允许处理器内核以高于标称时钟频率的频率运行。其中一个因素是时间爆发*。通常在几分之一秒内,处理器将达到最高频率。第一个循环很可能以比第二个循环更高的时钟频率运行。

您可以通过手动设置标称频率(处理器为 2.20 GHz)来确认这一点,例如通过 using cpufrequtilscpupower .然而,在许多系统上使用 intel_pstate,它不允许用户空间调控器。这是您可以 disable turbo mode for intel_pstate 的方法- 或 disable intel_pstate一起。

没有 turbo 模式性能应该是统一的。

*:温度是另一个因素,但我怀疑它是否对 10 毫秒基准时间起作用。为了说明这一点,假设 CPU 超过其 15 W TDP 并使用 20 W:即使是极小的 1 g 铜也只能 heat up by 0.5 K after 10 ms .我通常会看到一个短暂的明显爆发(时间,几十毫秒到几秒),然后是缓慢而稳定的下降(温度,几十秒到几分钟)

注意:gentestcases 在第一次实际测试之前运行了很长一段时间(例如 240 毫秒),这有助于处理器的“冲刺”。

关于c - 为什么第二次处理一个数组比较慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44503489/

相关文章:

c - 在非 NULL 终止的文件中搜索以 NULL 终止的字符串

javascript - 在 React 中根据唯一键禁用按钮?

c - 使用 Node.js ffi 模块分配无符号字符的缓冲区

c# - 在 Windows 窗体应用程序中动态增加/减少视频的速度

c++ - 有谁知道/有一个像 PHP 这样的 C++ 字符串处理库吗?

c - 定义有条件存在的元素的结构

c - C字符串困惑

javascript - 使用JS从txt文件中检索2个数组

c# - 正则表达式性能问题 - 任何人都可以解释这个正则表达式很慢的方式

performance - 从 ghci 和 shell 运行的已编译加速代码的性能差异