c - 使用内存屏障强制按顺序执行

尝试继续我的想法，即使用软件和硬件内存屏障，我可以在使用编译器优化编译的代码中禁用特定函数的乱序优化，因此我可以使用实现软件信号量像 Peterson 或 Deker 这样不需要乱序执行的算法，我已经测试了以下包含 SW barrier asm volatile("":::"memory") 和 gcc 内置 HW barrier __sync_synchronize:

#include <stdio.h>
int main(int argc, char ** argv)
{
    int x=0;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=1;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=2;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=3;
    printf("%d",x);
    return 0;
}

但是编译输出文件是:

main:
.LFB24:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    mfence
    mfence
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    mfence
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp

如果我移除障碍并再次编译，我会得到:

main
.LFB24:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp

两者都是在 Ubuntu 14.04.1 LTS, x86 中使用 gcc -Wall -O2 编译的。

预期的结果是包含内存屏障的代码的输出文件将包含我在源代码中的所有赋值，它们之间有 mfence。

根据相关的 StackOverflow 帖子 -

gcc memory barrier __sync_synchronize vs asm volatile("": : :"memory")

When adding your inline assembly on each iteration, gcc is not permitted to change the order of the operations past the barrier

随后:

However, when the CPU performes this code, it's permitted to reorder the operations "under the hood", as long as it does not break memory ordering model. This means that performing the operations can be done out of order (if the CPU supports that, as most do these days). A HW fence would have prevented that.

但是正如您所看到的，带有内存屏障的代码和没有内存屏障的代码之间的唯一区别是前者包含 mfence 的方式我没想到会看到它，并且并非所有作业都包括在内。

为什么带有内存屏障的文件的输出文件与我预期的不一样——为什么 mfence 顺序被改变了？为什么编译器删除了一些赋值？即使应用内存屏障并分隔每一行代码，编译器是否允许进行此类优化？

对内存屏障类型和用法的引用:

内存屏障 - http://bruceblinn.com/linuxinfo/MemoryBarriers.html
GCC 内置 - https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html

最佳答案

内存屏障告诉编译器/CPU 指令不应该跨屏障重新排序，它们并不意味着无论如何都必须完成可以证明毫无意义的写入。

如果您将 x 定义为 volatile，编译器无法假设它是唯一关心 x 的实体s 值并且必须遵循 C 抽象机的规则，这是为了实际发生内存写入。

在您的特定情况下，您可以跳过这些障碍，因为它已经保证 volatile 访问不会相互重新排序。

如果您有 C11 支持，您最好使用 _Atomic，这还可以保证正常分配不会针对您的 x 重新排序，并且访问是原子的。

编辑:GCC(以及 clang)在这方面似乎不一致，并不总是进行这种优化。 I opened a GCC bug report regarding this.

关于c - 使用内存屏障强制按顺序执行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38741512/

c - 使用内存屏障强制按顺序执行

上一篇：C函数名还是函数指针？

下一篇：c - 用什么代替 C 中的魔数(Magic Number)