c++ - _mm_comieq_ss Clang 和 GCC 之间的区别

标签 c++ gcc clang simd sse

我有一些 SIMD 代码来检查变量之间的相等性,但是当涉及 NaN 时,我在 GCC 和 clang 之间得到不同的结果:

bool equal(__m128 a, __m128 b){
    return _mm_comieq_ss(a,b) == 1;
}

int main()
{
    __m128 a, b, c;

    a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
    b = _mm_set_ss(1.0f);
    c = _mm_set_ss(1.0f);
    
    std::cout << "comieq(a,b):" << equal(a,b) << std::endl;
    std::cout << "comieq(b,a):" << equal(b,a) << std::endl;
    std::cout << "comieq(b,c):" << equal(b,c) << std::endl;
    std::cout << "comieq(a,a):" << equal(a,a) << std::endl;

    return 0;
}

Clang 和 GCC 返回不同的值:

gcc:
comieq(a,b):1
comieq(b,a):1
comieq(b,c):1
comieq(a,a):1

clang:
comieq(a,b):0
comieq(b,a):0
comieq(b,c):1
comieq(a,a):0

有人知道为什么会发生这种情况吗?我只是想检查两个寄存器是否相等;有没有一种一致的替代方法?

神箭:https://godbolt.org/z/ETKenE45f

最佳答案

比较 NaN 值时返回值的不同处理在 Clang 3.9.0 中特别发生了变化。 Related Link

尽管人们期望内在函数就是CPU固有的并且依赖于编译器,但the comiss instruction会在多个FLAGS位中生成结果。不同的内在函数检查不同的谓词来定义单个 bool 返回值;在 asm 中,程序员可以使用 jejb 和/或 jpsetcc/cmovcc 等指令组合来使用比较结果。

这里发生的情况是,GCC 仅检查 ZF(零标志)值,而 Clang 也(正确)检查 PF(“奇偶校验”标志:如果比较无序则设置,即输入之一是 NaN 。这个 matches the way integer FLAGS were 由 P6 x87 fcomi 设置,依次匹配旧的 x87 fcom/fstsw ax/sahf )。

我将从上面链接的讨论中提供一个简短引文,这可能会阐明 LLVM (clang) 团队做出决定背后的推理:

In Clang 3.8.0 and before, comparing two scalars of which at least one is a NaN would return 1. This is also the behavior that GCC, Visual Studio, and our current Emscripten code implements. This behavior is unintuitive in the sense that comparing NaNs in floats have the opposite tradition in IEEE-754, i.e. "nothing is equal to a NaN".

Intel is the original author of these intrinsics, and it must be admitted that these functions have long suffered from poor documentation. Intel doesn't spec in detail how these intrinsics should work with respect to NaNs (https://software.intel.com/en-us/node/514308), but presumably the reference implementation in their own compiler was held as the ground truth. The behavior that GCC, VS and Clang <= 3.8 each follow likely comes from adhering to the original code as implemented in Intel's compilers, where _mm_comieq_ss is implemented to perform the COMISS instruction and return the resulting zero flag (ZF) register state as the output int value of the intrinsic function. The COMISS instruction itself is though well documented since it's part of the ISA, and is shown e.g. at http://x86.renejeschke.de/html/file_module_x86_id_44.html. This shows the origin of the unexpected NaN behavior, since the zero flag is set if the comparison is equal, or if the comparison result is unordered, i.e. at least one of the registers is a NaN.


根据 Peter Cordes 的评论,现在很明显,(修改后的) clang 行为是正确的,并且上述引文中提到的英特尔“糟糕的文档”已得到纠正。 Intel documentation for _mm_comieq_ss 现在清楚地表明任何存在的 NaN 值都应产生零返回值:

Operation

RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0

关于c++ - _mm_comieq_ss Clang 和 GCC 之间的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75818896/

相关文章:

c++ - 如何在 C++ 中制作 clang 格式缩进外部 C block ?

c - 如何管理模块特定的指针

c - 如何验证函数 X() 是从函数 Y() 而不是从函数 Z() 调用的?

c++ - wxwidget with MinGW [编译/链接到 wxmsw26ud_core ]

c++ - 使用内部函数时程序崩溃

c - 位压缩结构中的数组

c++ - 为什么现代 C++ 编译器不优化这样的简单循环? (Clang,MSVC)

c++ - 如何从 clang-tidy 中的 CStyleCastExpr 匹配器获取宏名称?

c++ - 如何检测用户是否输入了空格键?

c++ - 什么是 string_view?