c++ - 我想优化这个短循环

标签 c++ performance algorithm

我想优化这个简单的循环:

unsigned int i;
while(j-- != 0){ //j is an unsigned int with a start value of about N = 36.000.000
   float sub = 0;
   i=1;
   unsigned int c = j+s[1];
   while(c < N) {
       sub += d[i][j]*x[c];//d[][] and x[] are arrays of float
       i++;
       c = j+s[i];// s[] is an array of unsigned int with 6 entries.
   }
   x[j] -= sub;                        // only one memory-write per j
}

对于 4000 MHz AMD Bulldozer，循环的执行时间约为一秒。我考虑过 SIMD 和 OpenMP(我通常使用它们来提高速度)，但这个循环是递归的。

有什么建议吗？

最佳答案

认为您可能想要转置矩阵 d——意味着以可以交换索引的方式存储它——使 i 成为外部索引:

    sub += d[j][i]*x[c];

而不是

    sub += d[i][j]*x[c];

这应该会带来更好的缓存性能。

关于c++ - 我想优化这个短循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18262547/

上一篇：c++ - 是否为包含相同值的范围定义了 std::nth_element？

下一篇：c++ - 用 std::cout 正确地用零填充负整数

相关文章：

c++ - 如何使用 setS 顶点列表创建 boost 子图？

java - 25k 用户后的大数据处理堆栈

performance - HTML5 Canvas 性能 : Loading Images vs Drawing

javascript - 用于从每个条目具有不同权重的数组中选择项目的高效 javascript 算法

c++ - 迭代器在排序或删除元素后是否有效？这段代码是一颗定时炸弹吗？

c++ - 帮助 BST 插入函数

c++ - Cin for array 导致错误

arrays - 是否有 O(n) 算法来查找数组中第一个缺失的数字？

python - 总和可被 k 整除的子序列数

algorithm - 组合学:分组字符挑战

©2024 IT工具网联系我们