c - while(i--) 通过 gcc 和 clang 优化 : why don't they use sub/jnc?

有些人在需要不带计数器或带n-1, ..., 0 计数器的循环时编写这样的代码:

while (i--) { ... }

具体例子:

volatile int sink;
void countdown_i_used() {
    unsigned i = 1000;
    while (i--) {
         sink = i;  // if i is unused, gcc optimizes it away and uses dec/jnz
    }
}

在 GCC 8.2 ( on the Godbolt compiler explorer ) 上，它被编译成

# gcc8.2 -O3 -march=haswell
.L2:
    mov     DWORD PTR sink[rip], eax
    dec     eax                      # with tune=generic,  sub eax, 1
    cmp     eax, -1
    jne     .L2

在 clang ( https://godbolt.org/z/YxYZ95 ) 上，如果不使用计数器，它将变为

if(i) do {...} while(--i);

但如果使用，就像 GCC 一样，它会变成

add esi, -1
cmp esi, -1
jnz lp

然而，这似乎是一个更好的主意:

sub esi, 1
jnc lp

为什么这两个编译器不采用这种方式呢？

因为 cmp 方式更好？或者因为它们不会以这种方式节省空间而且它们的速度几乎相同？

或者他们只是不考虑这个选项？

更新:即使我写代码使用进位方式(这里我用的是add/jc但是是一样的)

bool addcy(unsigned& a, unsigned b) {
    unsigned last_a = a;
    a+=b;
    return last_a+b<last_a;
}
volatile unsigned sink;
void f() {

    for (unsigned i=100; addcy(i, -1); ) {
        sink = i;
    }
}

compiler still compile it as checking equality to -1 .但是，如果将 100 替换为输入，the JC code remain

最佳答案

是的，这是一个错过的优化。英特尔 Sandybridge 系列可以将 sub/jcc 宏融合到一个 uop 中，因此 sub/jnc 可以节省这些 CPU 上的代码大小、x86 指令和 uops。

在其他 CPU 上(例如 AMD，它只能将 test/cmp 与 jcc 融合)，这仍然可以节省代码大小，因此至少稍微好一些。在任何方面都不会更糟。

在 https://bugs.llvm.org 上报告未优化错误是个好主意和 https://gcc.gnu.org/bugzilla/ .

关于c - while(i--) 通过 gcc 和 clang 优化 : why don't they use sub/jnc?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54278070/

c - while(i--) 通过 gcc 和 clang 优化 : why don't they use sub/jnc?

上一篇：c - 无法绕过 gcc 的 -Wconversion

下一篇：c - C中struct封装指针的目的