c - gcc 的原子操作和代码生成

我正在查看 gcc 为原子操作生成的一些程序集。我尝试了以下短序列:

int x1;
int x2;

int foo;

void test()
{
  __atomic_store_n( &x1, 1, __ATOMIC_SEQ_CST );
  if( __atomic_load_n( &x2  ,__ATOMIC_SEQ_CST ))
    return;

  foo = 4;
}

查看 Herb Sutter 关于代码生成的原子武器演讲，他提到 X86 手册要求使用 xchg 进行原子存储，使用简单的 mov 进行原子读取。所以我期待的是:

test():
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $1, %eax
    xchg    %eax, x1(%rip)
    movl    x2(%rip), %eax
    testl   %eax, %eax
    setne   %al
    testb   %al, %al
    je      .L2
    jmp     .L1
.L2:
    movl    $4, foo(%rip)
.L1:
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

由于锁定的 xchg 指令，内存栅栏是隐式的。

但是，如果我使用 gcc -march=core2 -S test.cc 编译它，我会得到以下结果:

test():
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $1, %eax
    movl    %eax, x1(%rip)
    mfence
    movl    x2(%rip), %eax
    testl   %eax, %eax
    setne   %al
    testb   %al, %al
    je      .L2
    jmp     .L1
.L2:
    movl    $4, foo(%rip)
.L1:
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

因此，这里 gcc 没有使用 xchg 操作，而是使用了 mov + mfence 组合。根据 Herb Sutter 的说法，这种代码生成不同于 x86 架构所要求的代码生成的原因是什么？

最佳答案

当目标是内存位置时，xchg 指令具有隐含的锁定语义。这意味着您可以原子地交换寄存器的内容与内存位置的内容。

问题中的示例是进行原子存储，而不是交换。 x86 体系结构内存模型保证在多处理器/多核系统中，一个线程完成的存储将按该顺序被其他线程看到……因此内存移动就足够了。话虽如此，有一些较旧的 Intel CPU 和一些克隆在这方面存在错误，并且需要 xchg 作为这些 CPU 的解决方法。请参阅这篇关于自旋锁的维基百科文章的重要优化部分:

http://en.wikipedia.org/wiki/Spinlock#Example_implementation

哪个州

The simple implementation above works on all CPUs using the x86 architecture. However, a number of performance optimizations are possible:

On later implementations of the x86 architecture, spin_unlock can safely use an unlocked MOV instead of the slower locked XCHG. This is due to subtle memory ordering rules which support this, even though MOV is not a full memory barrier. However, some processors (some Cyrix processors, some revisions of the Intel Pentium Pro (due to bugs), and earlier Pentium and i486 SMP systems) will do the wrong thing and data protected by the lock could be corrupted. On most non-x86 architectures, explicit memory barrier or atomic instructions (as in the example) must be used. On some systems, such as IA-64, there are special "unlock" instructions which provide the needed memory ordering.

内存屏障，mfence，确保所有存储都已完成(CPU 核心中的存储缓冲区为空，并且值存储在缓存或内存中)，它还确保没有 future 的加载执行顺序。

MOV 足以解锁互斥体(不需要序列化或内存屏障)这一事实在 1999 年由英特尔架构师“正式”回复了 Linus Torvalds

http://lkml.org/lkml/1999/11/24/90 .

我猜后来发现它不适用于某些较旧的 x86 处理器。

关于c - gcc 的原子操作和代码生成，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22282689/

c - gcc 的原子操作和代码生成

上一篇：c - 对数组的引用与对数组指针的引用

下一篇：参数太少的 C 函数调用