我有一些 SIMD 代码来检查变量之间的相等性,但是当涉及 NaN 时,我在 GCC 和 clang 之间得到不同的结果:
bool equal(__m128 a, __m128 b){
return _mm_comieq_ss(a,b) == 1;
}
int main()
{
__m128 a, b, c;
a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
b = _mm_set_ss(1.0f);
c = _mm_set_ss(1.0f);
std::cout << "comieq(a,b):" << equal(a,b) << std::endl;
std::cout << "comieq(b,a):" << equal(b,a) << std::endl;
std::cout << "comieq(b,c):" << equal(b,c) << std::endl;
std::cout << "comieq(a,a):" << equal(a,a) << std::endl;
return 0;
}
Clang 和 GCC 返回不同的值:
gcc:
comieq(a,b):1
comieq(b,a):1
comieq(b,c):1
comieq(a,a):1
clang:
comieq(a,b):0
comieq(b,a):0
comieq(b,c):1
comieq(a,a):0
有人知道为什么会发生这种情况吗?我只是想检查两个寄存器是否相等;有没有一种一致的替代方法?
最佳答案
比较 NaN
值时返回值的不同处理在 Clang 3.9.0 中特别发生了变化。 Related Link 。
尽管人们期望内在函数就是CPU固有的并且不依赖于编译器,但the comiss
instruction会在多个FLAGS位中生成结果。不同的内在函数检查不同的谓词来定义单个 bool 返回值;在 asm 中,程序员可以使用 je
、 jb
和/或 jp
或 setcc
/cmovcc
等指令组合来使用比较结果。
这里发生的情况是,GCC 仅检查 ZF
(零标志)值,而 Clang 也(正确)检查 PF
(“奇偶校验”标志:如果比较无序则设置,即输入之一是 NaN
。这个 matches the way integer FLAGS were 由 P6 x87 fcomi
设置,依次匹配旧的 x87 fcom
/fstsw ax
/sahf
)。
我将从上面链接的讨论中提供一个简短引文,这可能会阐明 LLVM (clang) 团队做出决定背后的推理:
In Clang 3.8.0 and before, comparing two scalars of which at least one is a NaN would return 1. This is also the behavior that GCC, Visual Studio, and our current Emscripten code implements. This behavior is unintuitive in the sense that comparing NaNs in floats have the opposite tradition in IEEE-754, i.e. "nothing is equal to a NaN".
Intel is the original author of these intrinsics, and it must be admitted that these functions have long suffered from poor documentation. Intel doesn't spec in detail how these intrinsics should work with respect to NaNs (https://software.intel.com/en-us/node/514308), but presumably the reference implementation in their own compiler was held as the ground truth. The behavior that GCC, VS and Clang <= 3.8 each follow likely comes from adhering to the original code as implemented in Intel's compilers, where _mm_comieq_ss is implemented to perform the COMISS instruction and return the resulting zero flag (ZF) register state as the output int value of the intrinsic function. The COMISS instruction itself is though well documented since it's part of the ISA, and is shown e.g. at http://x86.renejeschke.de/html/file_module_x86_id_44.html. This shows the origin of the unexpected NaN behavior, since the zero flag is set if the comparison is equal, or if the comparison result is unordered, i.e. at least one of the registers is a NaN.
根据 Peter Cordes 的评论,现在很明显,(修改后的) clang 行为是正确的,并且上述引文中提到的英特尔“糟糕的文档”已得到纠正。 Intel documentation for _mm_comieq_ss
现在清楚地表明任何存在的 NaN
值都应产生零返回值:
Operation
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
关于c++ - _mm_comieq_ss Clang 和 GCC 之间的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75818896/