c++ - `static_cast<volatile void>` 对优化器意味着什么？

当人们试图在各种库中执行严格的基准测试时，我有时会看到这样的代码:

auto std_start = std::chrono::steady_clock::now();
for (int i = 0; i < 10000; ++i)
  for (int j = 0; j < 10000; ++j)
    volatile const auto __attribute__((unused)) c = std_set.count(i + j);
auto std_stop = std::chrono::steady_clock::now();

这里使用volatile来防止优化器注意到被测代码的结果被丢弃，然后丢弃整个计算。

当被测代码没有返回值时，说它是void do_something(int)，然后有时我会看到这样的代码:

auto std_start = std::chrono::steady_clock::now();
for (int i = 0; i < 10000; ++i)
  for (int j = 0; j < 10000; ++j)
    static_cast<volatile void> (do_something(i + j));
auto std_stop = std::chrono::steady_clock::now();

volatile 的用法是否正确？什么是 volatile void？从编译器和标准的角度来看，这意味着什么？

在 [dcl.type.cv] 的标准 (N4296) 中说:

7 [ Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. Furthermore, for some implementations, volatile might indicate that special hardware instructions are required to access the object. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C ++ as they are in C. — end note ]

在 1.9 节中，它指定了很多关于执行模型的指导，但就 volatile 而言，它是关于“访问一个 volatile 对象”。我不清楚执行已转换为 volatile void 的语句意味着什么，假设我正确理解代码，以及如果产生任何优化障碍究竟会怎样。

最佳答案

static_cast<volatile void> (foo())不能要求编译器实际计算 foo()在任何 gcc/clang/MSVC/ICC 中，启用优化。

#include <bitset> void foo() { for (int i = 0; i < 10000; ++i) for (int j = 0; j < 10000; ++j) { std::bitset<64> std_set(i + j); //volatile const auto c = std_set.count(); // real work happens static_cast<volatile void> (std_set.count()); // optimizes away } }

编译为 ret所有 4 个主要的 x86 编译器。 (MSVC 为 std::bitset::count() 或其他东西的独立定义发出 asm，但向下滚动以获得其对 foo() 的琐碎定义。

(此示例和下一个示例的来源 + asm 输出 Matt Godbolt's compiler explorer )

也许有一些编译器在哪里static_cast<volatile void>()确实做了一些事情，在这种情况下，它可能是一种更轻量级的方式来编写一个重复循环，它不花费指令将结果存储到内存中，只计算它。 (这有时可能是您在微基准测试中想要的)。

用 tmp += foo() 累加结果(或 tmp |= )并从 main() 返回它或用 printf 打印也很有用，而不是存储到 volatile 中多变的。或者各种特定于编译器的东西，比如使用空内联 asm语句来破坏编译器的优化能力，而无需实际添加任何指令。

参见 Chandler Carruth's CppCon2015 talk on using perf to investigate compiler optimizations ，他在那里展示了一个 optimizer-escape function for GNU C .但是他的escape()函数被编写为要求值在内存中(将 asm a void* 传递给它，带有 "memory" 破坏)。我们不需要那个，我们只需要编译器将值保存在寄存器或内存中，甚至是一个立即常量。 (它不太可能完全展开我们的循环，因为它不知道 asm 语句是零指令。)

此代码在 gcc 上编译为只是 popcnt，没有任何额外存储。

// just force the value to be in memory, register, or even immediate // instead of empty inline asm, use the operand in a comment so we can see what the compiler chose. Absolutely no effect on optimization. static void escape_integer(int a) { asm volatile("# value = %0" : : "g"(a)); } // simplified with just one inner loop void test1() { for (int i = 0; i < 10000; ++i) { std::bitset<64> std_set(i); int count = std_set.count(); escape_integer(count); } }

#gcc8.0 20171110 nightly -O3 -march=nehalem (for popcnt instruction): test1(): # value = 0 # it peels the first iteration with an immediate 0 for the inline asm. mov eax, 1 .L4: popcnt rdx, rax # value = edx # the inline-asm comment has the %0 filled in to show where gcc put the value add rax, 1 cmp rax, 10000 jne .L4 ret

Clang 选择将值放在内存中以满足 "g"约束，这很愚蠢。但是当你给它一个包含内存作为选项的内联汇编约束时，clang 确实倾向于这样做。所以它并不比钱德勒的escape好。功能。

# clang5.0 -O3 -march=nehalem test1(): xor eax, eax #DEBUG_VALUE: i <- 0 .LBB1_1: # =>This Inner Loop Header: Depth=1 popcnt rcx, rax mov dword ptr [rsp - 4], ecx # value = -4(%rsp) # inline asm gets a value in memory inc rax cmp rax, 10000 jne .LBB1_1 ret

ICC18 -march=haswell这样做:

test1(): xor eax, eax #30.16 ..B2.2: # Preds ..B2.2 ..B2.1 # optimization report # %s was not vectorized: ASM code cannot be vectorized xor rdx, rdx # breaks popcnt's false dep on the destination popcnt rdx, rax #475.16 inc rax #30.34 # value = edx cmp rax, 10000 #30.25 jl ..B2.2 # Prob 99% #30.25 ret #35.1

这很奇怪，ICC 使用了 xor rdx,rdx而不是 xor eax,eax .这浪费了 REX 前缀，并且不被认为是对 Silvermont/KNL 的依赖性破坏。

关于c++ - `static_cast<volatile void>` 对优化器意味着什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47253655/

c++ - `static_cast<volatile void>` 对优化器意味着什么？

上一篇：c++ - std::promise 的 VC++ 实现

下一篇：c++ - 如何将信号作为函数参数传递？