c++ - 带有 SSE2 的 Newton Raphson - 有人可以解释一下这 3 行吗

我偶然发现了这三行代码:

The SIMD version is already quite a bit faster, but we can do better. Intel has added a fast 1/sqrt(x) function to the SSE2 instruction set. The only drawback is that its precision is limited. We need the precision, so we refine it using Newton-Rhapson:

 __m128 nr = _mm_rsqrt_ps( x ); 
 __m128 muls = _mm_mul_ps( _mm_mul_ps( x, nr ), nr ); 
 result = _mm_mul_ps( _mm_mul_ps( half, nr ), _mm_sub_ps( three, muls ) );

This code assumes the existence of a __m128 variable named 'half' (four times 0.5f) and a variable 'three' (four times 3.0f).

我知道如何使用 Newton Raphson 来计算函数的零，并且我知道如何使用它来计算数字的平方根，但我只是看不出这段代码是如何执行它的。

谁能给我解释一下？

最佳答案

给定牛顿迭代 y_n+1=y_n(3-x(y_n)^2)/2 ，在源代码中看到这一点应该很简单。

 __m128 nr   = _mm_rsqrt_ps( x );                  // The initial approximation y_0
 __m128 muls = _mm_mul_ps( _mm_mul_ps( x, nr ), nr ); // muls = x*nr*nr == x(y_n)^2
 result = _mm_mul_ps(
               _mm_sub_ps( three, muls )    // this is 3.0 - mul;
   /*multiplied by */ __mm_mul_ps(half,nr)  // y_0 / 2 or y_0 * 0.5
 );

准确地说，此算法适用于 the inverse square root .

请注意，此 still doesn't give fully a fully accurate result .具有 NR 迭代的 rsqrtps 提供了几乎 23 位的准确度，而 sqrtps 的 24 位具有正确舍入的最后一位。

如果您想要 truncate the result to integer，那么有限的准确性是一个问题。 . (int)4.99999 是 4。另外，如果使用 sqrt(x) ~= x * sqrt(x)，请注意 x == 0.0 的情况，因为 0 * +Inf = NaN .

关于c++ - 带有 SSE2 的 Newton Raphson - 有人可以解释一下这 3 行吗，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14752399/

c++ - 带有 SSE2 的 Newton Raphson - 有人可以解释一下这 3 行吗

上一篇：c++ - 为什么模板模板参数不允许在参数列表后出现 'typename'

下一篇：c++ - 同时为 booster 构建 32 位和 64 位库？