从我的大学类(class)中,我听说,按照惯例,将更可能的条件放在 if
中比放在 else
中更好,这可能有助于 静态 分支预测器。例如:
if (check_collision(player, enemy)) { // very unlikely to be true
doA();
} else {
doB();
}
可以重写为:
if (!check_collision(player, enemy)) {
doB();
} else {
doA();
}
我找到了一篇博文 Branch Patterns, Using GCC ,它更详细地解释了这种现象:
Forward branches are generated for if statements. The rationale for making them not likely to be taken is that the processor can take advantage of the fact that instructions following the branch instruction may already be placed in the instruction buffer inside the Instruction Unit.
在它旁边,它说(强调我的):
When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.
最终,有文章,由 Intel 撰写,Branch and Loop Reorganization to Prevent Mispredicts ,它用两条规则总结了这一点:
Static branch prediction is used when there is no data collected by the microprocessor when it encounters a branch, which is typically the first time a branch is encountered. The rules are simple:
- A forward branch defaults to not taken
- A backward branch defaults to taken
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common.
据我所知,这个想法是流水线 CPU 可以遵循指令缓存中的指令,而不会通过跳转到代码段内的另一个地址来破坏它。不过,我知道,在现代 CPU 微架构的情况下,这可能被大大简化了。
但是,GCC 似乎不遵守这些规则。给定代码:
extern void foo();
extern void bar();
int some_func(int n)
{
if (n) {
foo();
}
else {
bar();
}
return 0;
}
它生成(版本 6.3.0 带有 -O3 -mtune=intel
):
some_func:
lea rsp, [rsp-8]
xor eax, eax
test edi, edi
jne .L6 ; here, forward branch if (n) is (conditionally) taken
call bar
xor eax, eax
lea rsp, [rsp+8]
ret
.L6:
call foo
xor eax, eax
lea rsp, [rsp+8]
ret
我发现强制执行所需行为的唯一方法是使用 __builtin_expect
重写 if
条件。如下:
if (__builtin_expect(n, 1)) { // force n condition to be treated as true
所以汇编代码会变成:
some_func:
lea rsp, [rsp-8]
xor eax, eax
test edi, edi
je .L2 ; here, backward branch is (conditionally) taken
call foo
xor eax, eax
lea rsp, [rsp+8]
ret
.L2:
call bar
xor eax, eax
lea rsp, [rsp+8]
ret
最佳答案
简短的回答:不,不是。
GCC 进行了大量非平凡的优化,其中之一是通过控制流图来猜测分支概率。
根据 GCC manual :
fno-guess-branch-probability
Do not guess branch probabilities using heuristics.
GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (
-fprofile-arcs
). These heuristics are based on the control flow graph. If some branch probabilities are specified by__builtin_expect
, then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the__builtin_expec
t info into account. The interactions between the heuristics and__builtin_expect
can be complex, and in some cases, it may be useful to disable the heuristics so that the effects of__builtin_expect
are easier to understand.
-freorder-blocks
也可以交换分支。
此外,正如 OP 提到的,该行为可能会被 __builtin_expect
覆盖。
证明
查看以下 list 。
void doA() { printf("A\n"); }
void doB() { printf("B\n"); }
int check_collision(void* a, void* b)
{ return a == b; }
void some_func (void* player, void* enemy) {
if (check_collision(player, enemy)) {
doA();
} else {
doB();
}
}
int main() {
// warming up gcc statistic
some_func((void*)0x1, NULL);
some_func((void*)0x2, NULL);
some_func((void*)0x3, NULL);
some_func((void*)0x4, NULL);
some_func((void*)0x5, NULL);
some_func(NULL, NULL);
return 0;
}
很明显,check_collision
大多数时候会返回 0
。所以,doB()
分支很可能是 GCC 可以猜到的:
gcc -O main.c -o opt.a
objdump -d opt.a
some_func
的汇编是:
sub $0x8,%rsp
cmp %rsi,%rdi
je 6c6 <some_func+0x18>
mov $0x0,%eax
callq 68f <doB>
add $0x8,%rsp
retq
mov $0x0,%eax
callq 67a <doA>
jmp 6c1 <some_func+0x13>
但可以肯定的是,我们可以强制 GCC 避免过于智能:
gcc -fno-guess-branch-probability main.c -o non-opt.a
objdump -d non-opt.a
我们将得到:
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
mov %rdi,-0x8(%rbp)
mov %rsi,-0x10(%rbp)
mov -0x10(%rbp),%rdx
mov -0x8(%rbp),%rax
mov %rdx,%rsi
mov %rax,%rdi
callq 6a0 <check_collision>
test %eax,%eax
je 6ef <some_func+0x33>
mov $0x0,%eax
callq 67a <doA>
jmp 6f9 <some_func+0x3d>
mov $0x0,%eax
callq 68d <doB>
nop
leaveq
retq
因此 GCC 将按照源代码顺序留下分支。
我使用 gcc 7.1.1 进行这些测试。
关于c - GCC 是否为静态分支预测生成次优代码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41880779/