c - 奇怪的/fp 浮点模型标志行为

我正在检查一些使用 /fp:precise 和 /fp:fast 标志的代码。

根据MSDN documentation对于 /fp:precise:

With /fp:precise on x86 processors, the compiler will perform rounding on variables of type float to the proper precision for assignments and casts and when passing parameters to a function. This rounding guarantees that the data does not retain any significance greater than the capacity of its type. A program compiled with /fp:precise can be slower and larger than one compiled without /fp:precise. /fp:precise disables intrinsics; the standard run-time library routines are used instead. For more information, see /Oi (Generate Intrinsic Functions).

查看对 sqrtf 调用的反汇编(使用 /arch:SSE2 调用，目标 x86/Win32 平台):

0033185D  cvtss2sd    xmm0,xmm1  
00331861  call        __libm_sse2_sqrt_precise (0333370h)  
00331866  cvtsd2ss    xmm0,xmm0

来自 this question我相信现代 x86/x64 处理器不使用 80 位寄存器(或者至少不鼓励使用它们)，因此编译器会执行我认为是下一个最好的事情并使用 64 位 double 进行计算。由于内部函数被禁用，因此调用了库 sqrtf 函数。

好吧，很公平，这似乎符合文档所说的内容。

但是，当我为 x64 arch 编译时，发生了一些奇怪的事情:

000000013F2B199E  movups      xmm0,xmm1  
000000013F2B19A1  sqrtps      xmm1,xmm1  
000000013F2B19A4  movups      xmmword ptr [rcx+rax],xmm1

不使用 64 位 double 执行计算，而是使用内部函数。据我所知，结果与使用 /fp:fast 标志完全相同。

为什么两者之间存在差异？ /fp:precise 是否根本不适用于 x64 平台？

现在，作为完整性检查，我使用 /fp:precise 和 /arch:SSE2 在 VS2010 x86 中测试了相同的代码。令人惊讶的是，正在使用 sqrtpd 内部函数!

00AF14C7  cvtps2pd    xmm0,xmm0  
00AF14CA  sqrtsd      xmm0,xmm0  
00AF14CE  cvtpd2ps    xmm0,xmm0

这是怎么回事？为什么 VS2010 使用内部函数而 VS2012 调用系统库？

针对 x64 平台测试 VS2010 的结果与 VS2012 相似(/fp:precise 似乎被忽略)。

我无法访问任何旧版本的 VS，因此我无法在这些平台上进行任何测试。

作为引用，我正在使用 Intel i5-m430 处理器在 Windows 7 64 位中进行测试。

最佳答案

首先你应该阅读this关于中间浮点精度的非常好的博客文章。本文仅处理 visual studio 生成的代码(但这就是您的问题所在)。现在来看例子:

0033185D  cvtss2sd    xmm0,xmm1  
00331861  call        __libm_sse2_sqrt_precise (0333370h)  
00331866  cvtsd2ss    xmm0,xmm0

此汇编程序代码是为 x86 平台使用 /fp:precise/arch:SSE2 生成的。根据documentation ，精确的浮点模型促进所有计算在内部在 x86 平台上翻倍。它还会阻止使用内在函数(我想你已经阅读了 this information)。因此，代码以从 float 到 double 的转换开始，然后是 double sqrt 调用，最后将结果转换回 float。

000000013F2B199E  movups      xmm0,xmm1  
000000013F2B19A1  sqrtps      xmm1,xmm1  
000000013F2B19A4  movups      xmmword ptr [rcx+rax],xmm1

第二个示例是为 x64 (amd64) 平台编译的，这个平台的行为完全不同!根据文档:

For performance reasons, intermediate operations are computed at the widest precision of either operand instead of at the widest precision available.

因此，计算将在内部以单精度完成。我认为他们还决定尽可能使用内部函数，因此 /fp:precise 和 /fp:fast 之间的差异在x64 平台上。新的行为导致更快的代码并且它让程序员更好地控制到底发生了什么(他们能够改变游戏规则，因为兼容性问题对于新的 x64 平台来说并不重要)。遗憾的是，这些更改/差异未在文档中明确说明。

00AF14C7  cvtps2pd    xmm0,xmm0  
00AF14CA  sqrtsd      xmm0,xmm0  
00AF14CE  cvtpd2ps    xmm0,xmm0

最后，最后一个例子是用 Visual Studio 2010 编译器编译的，我认为他们不小心使用了 sqrt 的内在函数，而他们最好不要使用(至少对于 /fp:precise 模式) ，但他们决定再次在 Visual Studio 2012 中更改/修复此行为(请参阅 here)。

关于c - 奇怪的/fp 浮点模型标志行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15779156/

c - 奇怪的/fp 浮点模型标志行为

上一篇：c - 可能不一致的类型转换行为

下一篇：在 C/C++ 中创建 "igraph"中的加权无向图