c++ - 是否有一个真正有效的示例显示 x86_64 上 Store-Load 重新排序的副作用？

众所周知，在 x86_64 上可以进行存储-加载重新排序，前提是存储和加载之间没有MFENCE。

8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations

众所周知，在这样的示例中可以是存储加载重新排序

c.store(relaxed) <--> b.load(seq_cst): https://stackoverflow.com/a/42857017/1558037

// Atomic load-store
void test() {
    std::atomic<int> b, c;
    c.store(4, std::memory_order_relaxed);          // movl 4,[c];
    int tmp = b.load(std::memory_order_seq_cst);    // movl [b],[tmp];
}

可以重新排序为:

// Atomic load-store
void test() {
    std::atomic<int> b, c;
    int tmp = b.load(std::memory_order_seq_cst);    // movl [b],[tmp];
    c.store(4, std::memory_order_relaxed);          // movl 4,[c];
}

因为，x86_64 上没有MFENCE:

clang 4.0.0 - x86_64: https://godbolt.org/g/N9CPyJ
gcc 7.0 - x86_64:https://godbolt.org/g/MdjvI0

但是是否有一个真正有效的示例显示 x86_64 上 Store-Load 重新排序的副作用？

示例，使用 Store(seq_cst), Load(seq_cst) 时显示正确结果，但使用 Store(relaxed), Load(seq_cst) 时显示错误结果.

或者在 x86_64 上是否允许 Store-Load 重新排序，因为它无法在程序中检测和显示？

最佳答案

是的，在 C++11 和 x86_64 上有存储加载重新排序的示例。

首先，我们严格证明我们代码的正确性。然后在这段代码中，我们将移除 STORE 和 LOAD 之间的 mfence 屏障，然后看到算法失效。

有自定义锁(自旋锁)，它在没有 CAS/RMW 操作的情况下实现，只有有限数量的线程的加载和存储，其中每个线程编号为 0-4:

// example of Store-Load reordering if used: store(release)
struct lock_t {
    static const size_t max_locks = 5;
    std::atomic<int> locks[max_locks];

    bool lock(size_t const thread_id) {

        locks[thread_id].store(1, std::memory_order_seq_cst);                     // Store
        // store(seq_cst): mov; mfence;
        // store(release): mov;

        for (size_t i = 0; i < max_locks; ++i)
            if (locks[i].load(std::memory_order_seq_cst) > 0 && i != thread_id) { // Load
                locks[thread_id].store(0, std::memory_order_release);   // undo lock
                return false;
            }
        return true;
    }

    void unlock(size_t const thread_id) {
        locks[thread_id].store(0, std::memory_order_release);
    }
};

首先我们严格证明算法的正确性，具有acquire-release-semantic:

然后我们将展示如何阻止我们的锁定算法 - 结果应为:20000:
- 很好的例子，没有存储加载重新排序(结果:20000):http://coliru.stacked-crooked.com/a/baba611d686f0320
- 不好的例子，Store-Load 重新排序在哪里(结果:19976):http://coliru.stacked-crooked.com/a/99ff821b9f0127f4

C++ 差异:

然后我们展示汇编代码的区别:
- 很好的例子，没有 Store-Load 重新排序(有 mfence):https://godbolt.org/g/WrCiyW
- 不好的例子，Store-Load 重新排序在哪里(没有 mfence):https://godbolt.org/g/Eo3TXR

Asm x86_64 差异:

因为严格证明一个“好的”算法是正确的。并且由于我们看到“坏”算法无法正常工作(结果 19976 不等于 20000)。它们之间的唯一区别是 - STORE 和 LOAD 之间的屏障 mfence。因此，我们提供了发生 Store-Load 重新排序的算法。

此外，至少有一个 Store-Load 重新排序的示例 - 有点像我们的示例:Can x86 reorder a narrow store with a wider load that fully contains it?

关于c++ - 是否有一个真正有效的示例显示 x86_64 上 Store-Load 重新排序的副作用？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42909137/

c++ - 是否有一个真正有效的示例显示 x86_64 上 Store-Load 重新排序的副作用？

上一篇：c++ - xcode 无法使用未知类型命名空间构建 C++

下一篇：c++ - 从文件中逐 block 读取，然后逐行拆分测试