c++ - 从第二个排列开始时极度减速

标签 c++ performance function permutation compiler-optimization

考虑以下代码:

#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>

int main() {
    std::vector<int> v(12);
    std::iota(v.begin(), v.end(), 0);

    //std::next_permutation(v.begin(), v.end());

    using clock = std::chrono::high_resolution_clock;
    clock c;
    auto start = c.now();

    unsigned long counter = 0;
    do {
        ++counter;
    } while (std::next_permutation(v.begin(), v.end()));

    auto end = c.now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);    
    std::cout << counter << " permutations took " << duration.count() / 1000.0f << " s";
}

在我的 AMD 4.1 GHz CPU 上使用 GCC (MinGW) 5.3 -O2 编译，这需要 2.3 s。但是，如果我在未注释的行中添加注释，它会减慢到 3.4 s。我希望有一个最小的加速，因为我们测量一个排列的时间更少。使用 -O3 时，差异不那么极端 2.0 s 到 2.4 s。

谁能解释一下？ super 智能的编译器能否检测到我想要遍历所有排列并优化此代码？

最佳答案

我认为编译器会因为您在代码中的两行中调用该函数而感到困惑，导致它不是内联的。

GCC 8.0.0 的行为也与您的一样。

Benefits of inline functions in C++?它为编译器提供了一种简单的机制来应用更多优化，因此在某些情况下，丢失内联标识可能会导致性能严重下降。

关于c++ - 从第二个排列开始时极度减速，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44749074/

上一篇：c++ - strftime 的类似 snprintf 的合理替代品？

下一篇：c++ - 以不同方式分派(dispatch)右值和左值并使用 sfinae 禁用一个选项

c++ - 在没有事件异常线程的情况下终止调用 C++

c++ - 如何正确地释放指向对的指针列表？

c++ - Intel Xeon Phi 上的动态内存变慢

performance - Scala:嵌套 for 循环和 for 理解之间的性能差异

C - 在没有参数的情况下调用用参数声明的函数？

c++ - 使用循环初始化常量数组？

c++ - 读取文本文件然后将字符添加到列表？

c++ - 具有函数默认值的参数分配

python - 新手？ : is this a variable, 函数，还是什么？