x86 - 英特尔内在函数指南相对错误定义

一些英特尔向量内联是近似值，英特尔内联指南给出了最大相对误差。例如_mm256_rcp_ps内在的最大相对误差为1.5*2^-12。

我假设相对误差定义为relErr = abs((estVal-trueVal)/trueVal)。

但是如果真实值非常小并且估计值为 0 该怎么办？例如，对于 -1.11604e+38 的倒数，内在 _mm256_rcp_ps 给出的估计值为 -0.0，而真实值约为 -8.96021e-39(这将是一个非规范化 float ，不是吗？)必须对此做些什么吗？)。相对误差将为 1，但估计值仍然不错。如果估计为 0，如何衡量估计的质量？如果估计值为 0，《内在学指南》中的相对误差如何定义？

最佳答案

内在函数指南经常遗漏一些东西；它主要仅用于查找 asm 指令的内部名称以及它们的作用的简短摘要。

是的，对于标准化浮点结果，尾数的前 11 或 12 位将是正确的，如 |相对误差| 指定的那样。 ≤ 1.5 * 2^−12

asm 手册还记录了它忽略 MXCSR 位 FTZ(清零)和 DAZ(非正规数为零)，并且始终表现得基本上与设置时一样:微小(次正规)输入被视为零，并且微小结果也被刷新为零。

参见https://www.felixcloutier.com/x86/rcpps - 描述部分具体介绍了细节，并给出了一些保证/范围，说明当确切结果仍将标准化时，相对误差何时可能产生次正常结果(刷新为零)。

The RCPPS instruction is not affected by the rounding control bits in the MXCSR register. When a source value is a 0.0, an ∞ of the sign of the source value is returned. A denormal source value is treated as a 0.0 (of the same sign). Tiny results (see Section 4.9.1.5, “Numeric Underflow Exception (#U)” in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1) are always flushed to 0.0, with the sign of the operand.

(Input values greater than or equal to |1.11111111110100000000000B∗2¹²⁵| are guaranteed to not produce tiny results; input values less than or equal to |1.00000000000110000000001B∗2¹²⁶| are guaranteed to produce tiny results, which are in turn flushed to 0.0; and input values in between this range may or may not produce tiny results, depending on the implementation.)

关于值范围限制的最后一部分对我来说似乎是倒退的；输入越大，输出越小，那么如何保证所有小于某个数字的输入产生微小(次正常)结果呢？我认为它实际上是向后的，并且所有大小 >= 1.00000000000110000000001B*2¹²⁶ 的值都保证通过 FTZ 产生 0 结果。

关于x86 - 英特尔内在函数指南相对错误定义，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73917495/

x86 - 英特尔内在函数指南相对错误定义

上一篇：authentication - 可以对用户电子邮件进行哪些操作来防止重复

下一篇：android - 词典应用程序如何提供点击每个单词的功能？