c++ - 为什么这个 C++ 成员函数没有被编译器用 -O3 优化？

下面声明的 C++ vector 类中的 norm 成员函数被标记为 const 并且(据我所知)没有包含任何副作用。

template <unsigned int N>
struct vector {
  double v[N];

  double norm() const {
    double ret = 0;
    for (int i=0; i<N; ++i) {
      ret += v[i]*v[i];
    }
    return ret;
  }
};

double test(const vector<100>& x) {
  return x.norm() + x.norm();
}

如果我在 vector 的 const 实例化上多次调用 norm(参见上面的 test 函数) gcc 编译器(版本 5.4)和优化打开(即 -O3)然后编译器内联 norm，但仍然计算 norm 的结果多次，即使结果不应该改变。为什么编译器不优化对 norm 的第二次调用而只计算一次这个结果？ This answer似乎表明如果编译器确定 norm 函数没有任何副作用，则编译器应该执行此优化。为什么在这种情况下没有发生这种情况？

请注意，我正在使用 Compiler Explorer 确定编译器生成的内容，并且下面给出了 gcc 5.4 版的汇编输出。 clang 编译器给出了类似的结果。另请注意，如果我使用 gcc 的编译器属性手动将 norm 标记为使用 __attribute__((const)) 的 const 函数，那么第二次调用将被优化，如我所愿，但我的问题是为什么 gcc(和 clang)不自动执行此操作，因为 norm 定义可用？

test(vector<100u>&):
        pxor    xmm2, xmm2
        lea     rdx, [rdi+800]
        mov     rax, rdi
.L2:
        movsd   xmm1, QWORD PTR [rax]
        add     rax, 8
        cmp     rdx, rax
        mulsd   xmm1, xmm1
        addsd   xmm2, xmm1
        jne     .L2
        pxor    xmm0, xmm0
.L3:
        movsd   xmm1, QWORD PTR [rdi]
        add     rdi, 8
        cmp     rdx, rdi
        mulsd   xmm1, xmm1
        addsd   xmm0, xmm1
        jne     .L3
        addsd   xmm0, xmm2
        ret

最佳答案

编译器可以计算出norm的结果并多次重复使用。例如。 with the -Os switch :

test(vector<100u> const&):
        xorps   xmm0, xmm0
        xor     eax, eax
.L2:
        movsd   xmm1, QWORD PTR [rdi+rax]
        add     rax, 8
        cmp     rax, 800
        mulsd   xmm1, xmm1
        addsd   xmm0, xmm1
        jne     .L2
        addsd   xmm0, xmm0
        ret

缺少优化不是由于 not-associative-floating-point-math或 some observable-behavior-issue .

In a not properly mutexed environment another function might change the contents in the array in between the calls of norm

它可能会发生，但这不是编译器关心的问题(例如 https://stackoverflow.com/a/25472679/3235496 )。

使用 -O2 -fdump-tree-all 编译示例切换你可以看到:

g++ 正确检测到 vector<N>::norm()作为纯函数(输出文件 .local-pure-const1 )；
内联发生在早期阶段(输出文件 .einline )。

另请注意标记 norm与 __attribute__ ((noinline)) compiler performs CSE :

test(vector<100u> const&):
    sub     rsp, 8
    call    vector<100u>::norm() const
    add     rsp, 8
    addsd   xmm0, xmm0
    ret

Marc Glisse (可能)是对的。

un-inline the recurrent expression 需要更高级的公共(public)子表达式消除 .

关于c++ - 为什么这个 C++ 成员函数没有被编译器用 -O3 优化？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42619026/

c++ - 为什么这个 C++ 成员函数没有被编译器用 -O3 优化？

上一篇：c++ - 缩小从 `int`(常量表达式)到 `unsigned int` 的转换 - MSVC vs gcc vs clang

下一篇：c++ - C++11/C++14 中的文字类型类