C++优化简单循环

我正在使用 Visual Studio 2012 并在 x64 Release模式上构建。下面的代码占用了我程序运行所需时间的 33.5%。我使用 Visual Studio Profiler 对其进行了测量。

    //every variable is unsigned int or unsigned int*

    for(unsigned int i = 0; i < num; i++)
    {
        unique[ids[i]]++;//2.1%
        total[ids[i]] += main_list[id];//31.4%
    }

有人可以推荐一种方法来减少此功能的运行时间吗？

编辑:根据您的输入，我尝试了以下代码:

    const unsigned int now = main_list[id];

    for(unsigned int i = ids[0], j = 0; j < num; j++)
    {
        ++unique[i];//2.0%
        total[i] += now;//16.7%
        i = ids[j];//16.8%
    }

这证实了 CPU 分支预测可能失败的理论，因为位置是随机的(顺便说一句，它们不是完全随机的，而是经过排序的)。请问是否可以加快我的代码速度？

第二次编辑:我尝试了以下操作:

    const unsigned int now = main_list[id];

    for(unsigned int i = ids[0], j = 0; j < num; j++)
    {
        total[i] += now;//2.0%
        ++unique[i];//16.7%
        i = ids[j];//16.8%
    }

上面的测试应该很清楚发生了什么。

最佳答案

您的代码没有任何地方友好性。我会抛出两个可能的想法。

将 unique 和 total 组合在一起。

struct Stuff {
    unsigned int unique, total;
};

for(unsigned int i = 0; i < num; i++)
{
    Stuff& s = stuffs[ids[i]];
    s.unique++;
    s.total += main_list[id]; // <== is this supposed to be ids[i]?
}

这将确保您在内存中连续访问的内容实际上在内存中彼此相邻。按原样，假设 num 足够大，那么每一行都缺少缓存。那是您所能得到的最糟糕的情况。

排序 ids。现在，你还在内存中蹦蹦跳跳。让我们确保我们实际上可以按顺序进行:
```
std::sort(ids, ids + num);
// rest of loop as before
```

这样，当您处理 stuffs[ids[i]] 时，stuffs[ids[i+1]] 很可能会被预取。这也会为您节省大量查找时间。

关于C++优化简单循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32361994/

C++优化简单循环

上一篇：c++ - 树二进制 C/C++ 进程返回 -1073741819 (0xC0000005)

下一篇：c++ - C++宏中的字符串修改