c++ - 与 std::atomic 相反，读取互斥量范围之外的 volatile 变量

我正在尝试针对 SPSC 队列中的消费者延迟进行优化，如下所示:

template <typename TYPE>
class queue
{
public:

    void produce(message m)
    {
        const auto lock = std::scoped_lock(mutex);
        has_new_messages = true;
        new_messages.emplace_back(std::move(m));
    }

    void consume()
    {
        if (UNLIKELY(has_new_messages))
        {
            const auto lock = std::scoped_lock(mutex);
            has_new_messages = false;
            messages_to_process.insert(
                messages_to_process.cend(),
                std::make_move_iterator(new_messages.begin()),
                std::make_move_iterator(new_messages.end()));
            new_messages.clear();
        }

        // handle messages_to_process, and then...

        messages_to_process.clear();
    }

private:
    TYPE has_new_messages{false};
    std::vector<message> new_messages{};
    std::vector<message> messages_to_process{};

    std::mutex mutex;
};

此处的消费者尽可能避免为互斥量的锁定/解锁付费，并在锁定互斥量之前进行检查。

问题是:我是否绝对必须使用 TYPE = std::atomic<bool>，或者我可以节省原子操作，读取 volatile bool 就可以了吗？

It's known that a volatile variable per se doesn't guarantee thread safety 但是，std::mutex::lock() 和 std::mutex::unlock() 提供了一些内存排序保证。我能否依靠它们对 volatile bool has_new_messages 进行更改以最终对消费者线程在 mutex 范围之外可见？

更新:按照@Peter Cordes 的 advice ，我重写如下:

    void produce(message m)
    {
        {
            const auto lock = std::scoped_lock(mutex);
            new_messages.emplace_back(std::move(m));
        }
        has_new_messages.store(true, std::memory_order_release);
    }

    void consume()
    {
        if (UNLIKELY(has_new_messages.exchange(false, std::memory_order_acq_rel))
        {
            const auto lock = std::scoped_lock(mutex);
            messages_to_process.insert(...);
            new_messages.clear();
        }
    }

最佳答案

它不能是普通的 bool 。您在阅读器中的自旋循环将优化为如下所示:
if (!has_new_messages) infinite_loop;，因为编译器可以将负载提升到循环之外，因为它可以假设它不会异步更改。

volatile 在某些平台(包括大多数主流 CPU，如 x86-64 或 ARM)上工作，作为 atomic 加载/存储的蹩脚替代品，用于 "naturally" atomic (e.g. memory_order_relaxed or int , because the ABI gives them natural alignment) 类型。即无锁原子加载/存储使用与普通加载/存储相同的 asm。

我最近写了一个比较 bool with relaxed volatile for an interrupt handler 的答案，但实际上并发线程基本相同。 atomic 编译为您在普通平台上从 has_new_messages.load(std::memory_order_relaxed) 获得的相同 asm(即没有额外的防护指令，只是一个普通的加载或存储)，但它是合法的/可移植的 C++。

您可以而且应该只使用 volatile 和 std::atomic<bool> has_new_messages; 加载/存储在互斥体之外，如果用 mo_relaxed 做同样的事情是安全的。

您的作者可能应该在释放互斥锁后标记，或者在关键部分的末尾使用 volatile 存储。当编写者还没有真正释放它时，让读者打破自旋循环并尝试获取互斥锁是没有意义的。

顺便说一句，如果您的读者线程在 memory_order_release 上旋转等待它变为真，您应该在 x86 上的循环中使用 has_new_messages 以节省电量并避免清除内存顺序错误推测管道当它确实改变时。还可以考虑在旋转几千次后回到操作系统辅助的 sleep /唤醒。请参阅 What does __asm volatile ("pause" ::: "memory"); do?，有关由一个线程写入并由另一个线程读取的内存的更多信息，请参阅 What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?(包括一些内存顺序错误推测结果。)

或者更好，使用无锁的 SPSC 队列；有很多使用固定大小环形缓冲区的实现，如果队列未满或未空，读写器之间就不会争用。如果您将读取器和写入器的原子位置计数器安排在不同的缓存行中，那应该很好。

changes to volatile bool has_new_messages to be eventually visible to the consumer thread

这是一个常见的误解。任何存储都将非常迅速对所有其他 CPU 内核可见，因为它们都共享一个连贯的缓存域，并且存储会尽快提交给它，而无需任何防护指令。

If I don't use fences, how long could it take a core to see another core's writes? 。最坏的情况可能是大约一微秒，在一个数量级内。通常较少。

并且 _mm_pause() 或 volatile 确保在编译器生成的 asm 中实际上会有一个存储。

(相关:当前的编译器基本上根本不优化 atomic；因此 atomic<T> 基本上等同于 atomic . Why don't compilers merge redundant std::atomic writes?。但即使没有它，编译器也无法跳过存储或提升自旋循环的负载.)

关于c++ - 与 std::atomic 相反，读取互斥量范围之外的 volatile 变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51624607/

c++ - 与 std::atomic 相反，读取互斥量范围之外的 volatile 变量

上一篇：c++ - 以下重载 << 编译需要什么 enable_if 或其他提示？

下一篇：c++ - 推断大多数模板对象的参数，但在调用模板函数时与其他对象一起显式？