c++ - 8 个后续字节的测试未转换为单个比较指令

动机 this question ，我比较了三个不同的函数来检查参数指向的 8 个字节是否为零(请注意，在原始问题中，字符与 '0' 进行比较，而不是 0 ):

bool f1(const char *ptr)
{    
  for (int i = 0; i < 8; i++)
    if (ptr[i])
      return false;
  return true;
}

bool f2(const char *ptr)
{  
  bool res = true;
  for (int i = 0; i < 8; i++)
    res &= (ptr[i] == 0);
  return res;
}

bool f3(const char *ptr)
{  
  static const char tmp[8]{};
  return !std::memcmp(ptr, tmp, 8);
}

虽然我希望在启用优化的情况下获得相同的组装结果，但只有 memcmp版本被翻译成单cmp x64 上的指令。两者 f1和 f2被转换成缠绕或展开的循环。此外，这适用于所有带有 -O3 的 GCC、Clang 和 Intel 编译器。 .
有什么原因f1和 f2不能优化为单个比较指令？对我来说，这似乎是一个非常简单的优化。
现场演示:https://godbolt.org/z/j48366

最佳答案

Is there any reason why f1 and f2 cannot be optimized into a single compare instruction (possibly with additional unaligned load)? It seem to be a pretty straightforward optimization to me.

在 f1 中，当 ptr[i] 时循环停止是真的，所以它并不总是等同于考虑 8 个元素，因为它是其他两个函数的情况，或者如果数组的大小小于 8，则直接比较 8 个字节的字(编译器不知道数组):

f1("\000\001"); // no access out of the array
f2("\000\001"); // access out of the array
f3("\000\001"); // access out of the array

对于 f2，我同意在 CPU 允许从任何地址对齐读取 8 字节字的情况下，可以用 8 字节比较替换，这是 x64 的情况，但这可能会引入异常情况，如 Unusual situations where this wouldn't be safe in x86 asm 中所述。

关于c++ - 8 个后续字节的测试未转换为单个比较指令，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63392584/

c++ - 8 个后续字节的测试未转换为单个比较指令

上一篇：C++ : operator<< fails to compile

下一篇：c++ - ‘struct’ 之前的预期主表达式