c - 'switch' 比 'if' 快吗？

switch 语句实际上是否比if 语句快？

我使用 /Ox 标志在 Visual Studio 2010 的 x64 C++ 编译器上运行了以下代码:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define MAX_COUNT (1 << 29)
size_t counter = 0;

size_t testSwitch()
{
    clock_t start = clock();
    size_t i;
    for (i = 0; i < MAX_COUNT; i++)
    {
        switch (counter % 4 + 1)
        {
            case 1: counter += 4; break;
            case 2: counter += 3; break;
            case 3: counter += 2; break;
            case 4: counter += 1; break;
        }
    }
    return 1000 * (clock() - start) / CLOCKS_PER_SEC;
}

size_t testIf()
{
    clock_t start = clock();
    size_t i;
    for (i = 0; i < MAX_COUNT; i++)
    {
        const size_t c = counter % 4 + 1;
        if (c == 1) { counter += 4; }
        else if (c == 2) { counter += 3; }
        else if (c == 3) { counter += 2; }
        else if (c == 4) { counter += 1; }
    }
    return 1000 * (clock() - start) / CLOCKS_PER_SEC;
}

int main()
{
    printf("Starting...\n");
    printf("Switch statement: %u ms\n", testSwitch());
    printf("If     statement: %u ms\n", testIf());
}

得到这些结果:

Switch statement: 5261 ms
If statement: 5196 ms

据我所知，switch 语句显然使用跳转表来优化分支。

问题:

在 x86 或 x64 中，基本的跳转表是什么样的？
这段代码是否使用了跳转表？
为什么在这个例子中没有性能差异？是否有任何情况存在显着的性能差异？

代码反汇编:

testIf:

13FE81B10 sub  rsp,48h 
13FE81B14 call qword ptr [__imp_clock (13FE81128h)] 
13FE81B1A mov  dword ptr [start],eax 
13FE81B1E mov  qword ptr [i],0 
13FE81B27 jmp  testIf+26h (13FE81B36h) 
13FE81B29 mov  rax,qword ptr [i] 
13FE81B2E inc  rax  
13FE81B31 mov  qword ptr [i],rax 
13FE81B36 cmp  qword ptr [i],20000000h 
13FE81B3F jae  testIf+0C3h (13FE81BD3h) 
13FE81B45 xor  edx,edx 
13FE81B47 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81B4E mov  ecx,4 
13FE81B53 div  rax,rcx 
13FE81B56 mov  rax,rdx 
13FE81B59 inc  rax  
13FE81B5C mov  qword ptr [c],rax 
13FE81B61 cmp  qword ptr [c],1 
13FE81B67 jne  testIf+6Dh (13FE81B7Dh) 
13FE81B69 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81B70 add  rax,4 
13FE81B74 mov  qword ptr [counter (13FE835D0h)],rax 
13FE81B7B jmp  testIf+0BEh (13FE81BCEh) 
13FE81B7D cmp  qword ptr [c],2 
13FE81B83 jne  testIf+89h (13FE81B99h) 
13FE81B85 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81B8C add  rax,3 
13FE81B90 mov  qword ptr [counter (13FE835D0h)],rax 
13FE81B97 jmp  testIf+0BEh (13FE81BCEh) 
13FE81B99 cmp  qword ptr [c],3 
13FE81B9F jne  testIf+0A5h (13FE81BB5h) 
13FE81BA1 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81BA8 add  rax,2 
13FE81BAC mov  qword ptr [counter (13FE835D0h)],rax 
13FE81BB3 jmp  testIf+0BEh (13FE81BCEh) 
13FE81BB5 cmp  qword ptr [c],4 
13FE81BBB jne  testIf+0BEh (13FE81BCEh) 
13FE81BBD mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81BC4 inc  rax  
13FE81BC7 mov  qword ptr [counter (13FE835D0h)],rax 
13FE81BCE jmp  testIf+19h (13FE81B29h) 
13FE81BD3 call qword ptr [__imp_clock (13FE81128h)] 
13FE81BD9 sub  eax,dword ptr [start] 
13FE81BDD imul eax,eax,3E8h 
13FE81BE3 cdq       
13FE81BE4 mov  ecx,3E8h 
13FE81BE9 idiv eax,ecx 
13FE81BEB cdqe      
13FE81BED add  rsp,48h 
13FE81BF1 ret

testSwitch:

13FE81C00 sub  rsp,48h 
13FE81C04 call qword ptr [__imp_clock (13FE81128h)] 
13FE81C0A mov  dword ptr [start],eax 
13FE81C0E mov  qword ptr [i],0 
13FE81C17 jmp  testSwitch+26h (13FE81C26h) 
13FE81C19 mov  rax,qword ptr [i] 
13FE81C1E inc  rax  
13FE81C21 mov  qword ptr [i],rax 
13FE81C26 cmp  qword ptr [i],20000000h 
13FE81C2F jae  testSwitch+0C5h (13FE81CC5h) 
13FE81C35 xor  edx,edx 
13FE81C37 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81C3E mov  ecx,4 
13FE81C43 div  rax,rcx 
13FE81C46 mov  rax,rdx 
13FE81C49 inc  rax  
13FE81C4C mov  qword ptr [rsp+30h],rax 
13FE81C51 cmp  qword ptr [rsp+30h],1 
13FE81C57 je   testSwitch+73h (13FE81C73h) 
13FE81C59 cmp  qword ptr [rsp+30h],2 
13FE81C5F je   testSwitch+87h (13FE81C87h) 
13FE81C61 cmp  qword ptr [rsp+30h],3 
13FE81C67 je   testSwitch+9Bh (13FE81C9Bh) 
13FE81C69 cmp  qword ptr [rsp+30h],4 
13FE81C6F je   testSwitch+0AFh (13FE81CAFh) 
13FE81C71 jmp  testSwitch+0C0h (13FE81CC0h) 
13FE81C73 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81C7A add  rax,4 
13FE81C7E mov  qword ptr [counter (13FE835D0h)],rax 
13FE81C85 jmp  testSwitch+0C0h (13FE81CC0h) 
13FE81C87 mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81C8E add  rax,3 
13FE81C92 mov  qword ptr [counter (13FE835D0h)],rax 
13FE81C99 jmp  testSwitch+0C0h (13FE81CC0h) 
13FE81C9B mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81CA2 add  rax,2 
13FE81CA6 mov  qword ptr [counter (13FE835D0h)],rax 
13FE81CAD jmp  testSwitch+0C0h (13FE81CC0h) 
13FE81CAF mov  rax,qword ptr [counter (13FE835D0h)] 
13FE81CB6 inc  rax  
13FE81CB9 mov  qword ptr [counter (13FE835D0h)],rax 
13FE81CC0 jmp  testSwitch+19h (13FE81C19h) 
13FE81CC5 call qword ptr [__imp_clock (13FE81128h)] 
13FE81CCB sub  eax,dword ptr [start] 
13FE81CCF imul eax,eax,3E8h 
13FE81CD5 cdq       
13FE81CD6 mov  ecx,3E8h 
13FE81CDB idiv eax,ecx 
13FE81CDD cdqe      
13FE81CDF add  rsp,48h 
13FE81CE3 ret

更新:

有趣的结果 here .不过，不确定为什么一个更快，一个更慢。

最佳答案

编译器可以对开关进行多种优化。我不认为经常提到的“跳转表”是一个非常有用的，因为它只在输入可以以某种方式有界时才有效。

“跳转表”的 C 伪代码类似于 this -- 请注意，编译器实际上需要在表格周围插入某种形式的 if 测试，以确保输入在表格中有效。另请注意，它仅适用于输入是一系列连续数字的特定情况。

如果 switch 中的分支数非常大，编译器可以对 switch 的值进行二进制搜索，这(在我看来)将是一个更有用的优化，因为它确实显着增加了在某些情况下，性能与开关一样通用，并且不会导致生成的代码量更大。但是要看到这一点，您的测试代码需要更多分支才能看到任何差异。

回答您的具体问题:

Clang 生成一个看起来像 this 的:

test_switch(char):                       # @test_switch(char)
        movl    %edi, %eax
        cmpl    $19, %edi
        jbe     .LBB0_1
        retq
.LBB0_1:
        jmpq    *.LJTI0_0(,%rax,8)
        jmp     void call<0u>()         # TAILCALL
        jmp     void call<1u>()         # TAILCALL
        jmp     void call<2u>()         # TAILCALL
        jmp     void call<3u>()         # TAILCALL
        jmp     void call<4u>()         # TAILCALL
        jmp     void call<5u>()         # TAILCALL
        jmp     void call<6u>()         # TAILCALL
        jmp     void call<7u>()         # TAILCALL
        jmp     void call<8u>()         # TAILCALL
        jmp     void call<9u>()         # TAILCALL
        jmp     void call<10u>()        # TAILCALL
        jmp     void call<11u>()        # TAILCALL
        jmp     void call<12u>()        # TAILCALL
        jmp     void call<13u>()        # TAILCALL
        jmp     void call<14u>()        # TAILCALL
        jmp     void call<15u>()        # TAILCALL
        jmp     void call<16u>()        # TAILCALL
        jmp     void call<17u>()        # TAILCALL
        jmp     void call<18u>()        # TAILCALL
        jmp     void call<19u>()        # TAILCALL
.LJTI0_0:
        .quad   .LBB0_2
        .quad   .LBB0_3
        .quad   .LBB0_4
        .quad   .LBB0_5
        .quad   .LBB0_6
        .quad   .LBB0_7
        .quad   .LBB0_8
        .quad   .LBB0_9
        .quad   .LBB0_10
        .quad   .LBB0_11
        .quad   .LBB0_12
        .quad   .LBB0_13
        .quad   .LBB0_14
        .quad   .LBB0_15
        .quad   .LBB0_16
        .quad   .LBB0_17
        .quad   .LBB0_18
        .quad   .LBB0_19
        .quad   .LBB0_20
        .quad   .LBB0_21

我可以说它没有使用跳转表——4条比较指令清晰可见:

13FE81C51 cmp  qword ptr [rsp+30h],1 
13FE81C57 je   testSwitch+73h (13FE81C73h) 
13FE81C59 cmp  qword ptr [rsp+30h],2 
13FE81C5F je   testSwitch+87h (13FE81C87h) 
13FE81C61 cmp  qword ptr [rsp+30h],3 
13FE81C67 je   testSwitch+9Bh (13FE81C9Bh) 
13FE81C69 cmp  qword ptr [rsp+30h],4 
13FE81C6F je   testSwitch+0AFh (13FE81CAFh)

基于跳转表的解决方案根本不使用比较。

要么没有足够的分支导致编译器生成跳转表，要么您的编译器根本不生成它们。我不确定是哪个。

EDIT 2014:熟悉 LLVM 优化器的人在其他地方进行了一些讨论，称跳转表优化在许多情况下都很重要；例如在存在具有许多值的枚举和许多反对所述枚举中的值的情况下。也就是说，我坚持我在 2011 年所说的话——我经常看到人们在想“如果我改变它，无论我有多少个病例，它都会是同一时间”——这是完全错误的。即使使用跳转表，您也会获得间接跳转成本，并为每种情况支付表中的条目；内存带宽对现代硬件来说是个大问题。

为可读性编写代码。 Any compiler worth its salt is going to see an if / else if ladder and transform it into equivalent switch or vice versa if it would be faster to do so.

关于c - 'switch' 比 'if' 快吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6805026/

c - 'switch' 比 'if' 快吗？

问题:

更新:

上一篇：c - C语言中位反转(从MSB-> LSB到LSB-> MSB)的高效算法

下一篇：c - C中如何实现函数重载？