c - GCC 是否为静态分支预测生成次优代码?

标签 c gcc assembly x86 branch-prediction

从我的大学类(class)中,我听说,按照惯例,将更可能的条件放在 if 中比放在 else 中更好,这可能有助于 静态 分支预测器。例如:

if (check_collision(player, enemy)) { // very unlikely to be true
    doA();
} else {
    doB();
}

可以重写为:

if (!check_collision(player, enemy)) {
    doB();
} else {
    doA();
}

我找到了一篇博文 Branch Patterns, Using GCC ,它更详细地解释了这种现象:

Forward branches are generated for if statements. The rationale for making them not likely to be taken is that the processor can take advantage of the fact that instructions following the branch instruction may already be placed in the instruction buffer inside the Instruction Unit.

在它旁边,它说(强调我的):

When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.

最终,有文章,由 Intel 撰写,Branch and Loop Reorganization to Prevent Mispredicts ,它用两条规则总结了这一点:

Static branch prediction is used when there is no data collected by the microprocessor when it encounters a branch, which is typically the first time a branch is encountered. The rules are simple:

  • A forward branch defaults to not taken
  • A backward branch defaults to taken

In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common.

据我所知,这个想法是流水线 CPU 可以遵循指令缓存中的指令,而不会通过跳转到代码段内的另一个地址来破坏它。不过,我知道,在现代 CPU 微架构的情况下,这可能被大大简化了。

但是,GCC 似乎不遵守这些规则。给定代码:

extern void foo();
extern void bar();

int some_func(int n)
{
    if (n) {
        foo();
    }
    else {
        bar();
    }
    return 0;
}

它生成(版本 6.3.0 带有 -O3 -mtune=intel):

some_func:
        lea     rsp, [rsp-8]
        xor     eax, eax
        test    edi, edi
        jne     .L6            ; here, forward branch if (n) is (conditionally) taken
        call    bar
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret
.L6:
        call    foo
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret

我发现强制执行所需行为的唯一方法是使用 __builtin_expect 重写 if 条件。如下:

if (__builtin_expect(n, 1)) { // force n condition to be treated as true

所以汇编代码会变成:

some_func:
        lea     rsp, [rsp-8]
        xor     eax, eax
        test    edi, edi
        je      .L2             ; here, backward branch is (conditionally) taken
        call    foo
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret
.L2:
        call    bar
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret

最佳答案

简短的回答:不,不是。

GCC 进行了大量非平凡的优化,其中之一是通过控制流图来猜测分支概率。

根据 GCC manual :

fno-guess-branch-probability

Do not guess branch probabilities using heuristics.

GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (-fprofile-arcs). These heuristics are based on the control flow graph. If some branch probabilities are specified by __builtin_expect, then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the __builtin_expect info into account. The interactions between the heuristics and __builtin_expect can be complex, and in some cases, it may be useful to disable the heuristics so that the effects of __builtin_expect are easier to understand.

-freorder-blocks 也可以交换分支。

此外,正如 OP 提到的,该行为可能会被 __builtin_expect 覆盖。

证明

查看以下 list 。

void doA() { printf("A\n"); }
void doB() { printf("B\n"); }
int check_collision(void* a, void* b)
{ return a == b; }

void some_func (void* player, void* enemy) {
    if (check_collision(player, enemy)) {
        doA();
    } else {
        doB();
    }
}

int main() {
    // warming up gcc statistic
    some_func((void*)0x1, NULL);
    some_func((void*)0x2, NULL);
    some_func((void*)0x3, NULL);
    some_func((void*)0x4, NULL);
    some_func((void*)0x5, NULL);
    some_func(NULL, NULL);
    return 0;
}

很明显,check_collision 大多数时候会返回 0。所以,doB() 分支很可能是 GCC 可以猜到的:

gcc -O main.c -o opt.a
objdump -d opt.a

some_func 的汇编是:

sub    $0x8,%rsp
cmp    %rsi,%rdi
je     6c6 <some_func+0x18>
mov    $0x0,%eax
callq  68f <doB>
add    $0x8,%rsp
retq   
mov    $0x0,%eax
callq  67a <doA>
jmp    6c1 <some_func+0x13>

但可以肯定的是,我们可以强制 GCC 避免过于智能:

gcc -fno-guess-branch-probability main.c -o non-opt.a
objdump -d non-opt.a

我们将得到:

push   %rbp
mov    %rsp,%rbp
sub    $0x10,%rsp
mov    %rdi,-0x8(%rbp)
mov    %rsi,-0x10(%rbp)
mov    -0x10(%rbp),%rdx
mov    -0x8(%rbp),%rax
mov    %rdx,%rsi
mov    %rax,%rdi
callq  6a0 <check_collision>
test   %eax,%eax
je     6ef <some_func+0x33>
mov    $0x0,%eax
callq  67a <doA>
jmp    6f9 <some_func+0x3d>
mov    $0x0,%eax
callq  68d <doB>
nop
leaveq 
retq  

因此 GCC 将按照源代码顺序留下分支。

我使用 gcc 7.1.1 进行这些测试。

关于c - GCC 是否为静态分支预测生成次优代码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41880779/

相关文章:

c - 生成的汇编代码包含一系列到下一个内存位置的跳转命令,目的是什么

arrays - 查找数组,汇编语言中的差距总和

c - 如何在 Code::Blocks 中添加另一个 C 文件

assembly - 气体 : too many memory reference

C语言 float 脑力破坏者

python - 如何在c中扩展python?

c - 我如何确定此 C 结构在 32 位和 64 位系统上均已打包? "__attribute__((packed))"总是必要的吗?

c - 我在 OSX 上的 gcc 中不断得到 "Segmentation fault: 11",但在 Windows 上工作

c - 从套接字读取时如何处理阻塞 read() 调用?

c - 使用 tcp_nodelay 发送大消息时出现奇怪的延迟