我正在尝试将范围缩减作为实现正弦函数的第一步。
我正在遵循论文 "ARGUMENT REDUCTION FOR HUGE ARGUMENTS" by K.C. NG 中描述的方法
当使用 x 从 0 到 20000 的输入范围时,我得到的错误大到 0.002339146。我的错误显然不应该那么大,我不确定如何减少它。我注意到误差幅度与输入余弦/正弦的 theta 幅度相关。
我能够获得论文提到的 Nearpi.c 代码,但我不确定如何将代码用于单精度浮点。如果有人感兴趣,可以在此链接中找到 Nearpi.c 文件:nearpi.c
这是我的 MATLAB 代码:
x = 0:0.1:20000;
% Perform range reduction
% Store constant 2/pi
twooverpi = single(2/pi);
% Compute y
y = (x.*twooverpi);
% Compute k (round to nearest integer
k = round(y);
% Solve for f
f = single(y-k);
% Solve for r
r = single(f*single(pi/2));
% Find last two bits of k
n = bitand(fi(k,1,32,0),fi(3,1,32,0));
n = single(n);
% Preallocate for speed
z(length(x)) = 0;
for i = 1:length(x)
switch(n(i))
case 0
z(i)=sin(r(i));
case 1
z(i) = single(cos(r(i)));
case 2
z(i) = -sin(r(i));
case 3
z(i) = single(-cos(r(i)));
otherwise
end
end
maxerror = max(abs(single(z - single(sin(single(x))))))
minerror = min(abs(single(z - single(sin(single(x))))))
我已经编辑了程序 Nearpi.c 以便它可以编译。但是我不确定如何解释输出。该文件还需要一个输入,我必须手动输入,我也不确定输入的重要性。
这是工作的 Nearpi.c:
/*
============================================================================
Name : nearpi.c
Author :
Version :
Copyright : Your copyright notice
Description : Hello World in C, Ansi-style
============================================================================
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/*
* Global macro definitions.
*/
# define hex( double ) *(1 + ((long *) &double)), *((long *) &double)
# define sgn(a) (a >= 0 ? 1 : -1)
# define MAX_k 2500
# define D 56
# define MAX_EXP 127
# define THRESHOLD 2.22e-16
/*
* Global Variables
*/
int CFlength, /* length of CF including terminator */
binade;
double e,
f; /* [e,f] range of D-bit unsigned int of f;
form 1X...X */
// Function Prototypes
int dbleCF (double i[], double j[]);
void input (double i[]);
void nearPiOver2 (double i[]);
/*
* This is the start of the main program.
*/
int main (void)
{
int k; /* subscript variable */
double i[MAX_k],
j[MAX_k]; /* i and j are continued fractions
(coeffs) */
// fp = fopen("/src/cfpi.txt", "r");
/*
* Compute global variables e and f, where
*
* e = 2 ^ (D-1), i.e. the D bit number 10...0
* and
* f = 2 ^ D - 1, i.e. the D bit number 11...1 .
*/
e = 1;
for (k = 2; k <= D; k = k + 1)
e = 2 * e;
f = 2 * e - 1;
/*
* Compute the continued fraction for (2/e)/(pi/2) , i.e.
* q's starting value for the first binade, given the continued
* fraction for pi as input; set the global variable CFlength
* to the length of the resulting continued fraction (including
* its negative valued terminator). One should use as many
* partial coefficients of pi as necessary to resolve numbers
* of the width of the underflow plus the overflow threshold.
* A rule of thumb is 0.97 partial coefficients are generated
* for every decimal digit of pi .
*
* Note: for radix B machines, subroutine input should compute
* the continued fraction for (B/e)/(pi/2) where e = B ^ (D - 1).
*/
input (i);
/*
* Begin main loop over all binades:
* For each binade, find the nearest multiples of pi/2 in that binade.
*
* [ Note: for hexadecimal machines ( B = 16 ), the rest of the main
* program simplifies(!) to
*
* B_ade = 1;
* while (B_ade < MAX_EXP)
* {
* dbleCF (i, j);
* dbleCF (j, i);
* dbleCF (i, j);
* CFlength = dbleCF (j, i);
* B_ade = B_ade + 1;
* }
* }
*
* because the alternation of source & destination are no longer necessary. ]
*/
binade = 1;
while (binade < MAX_EXP)
{
/*
* For the current (odd) binade, find the nearest multiples of pi/2.
*/
nearPiOver2 (i);
/*
* Double the continued fraction to get to the next (even) binade.
* To save copying arrays, i and j will alternate as the source
* and destination for the continued fractions.
*/
CFlength = dbleCF (i, j);
binade = binade + 1;
/*
* Check for main loop termination again because of the
* alternation.
*/
if (binade >= MAX_EXP)
break;
/*
* For the current (even) binade, find the nearest multiples of pi/2.
*/
nearPiOver2 (j);
/*
* Double the continued fraction to get to the next (odd) binade.
*/
CFlength = dbleCF (j, i);
binade = binade + 1;
}
return 0;
} /* end of Main Program */
/*
* Subroutine DbleCF doubles a continued fraction whose partial
* coefficients are i[] into a continued fraction j[], where both
* arrays are of a type sufficient to do D-bit integer arithmetic.
*
* In my case ( D = 56 ) , I am forced to treat integers as double
* precision reals because my machine does not have integers of
* sufficient width to handle D-bit integer arithmetic.
*
* Adapted from a Basic program written by W. Kahan.
*
* Algorithm based on Hurwitz's method of doubling continued
* fractions (see Knuth Vol. 3, p.360).
*
* A negative value terminates the last partial quotient.
*
* Note: for the non-C programmers, the statement break
* exits a loop and the statement continue skips to the next
* case in the same loop.
*
* The call modf ( l / 2, &l0 ) assigns the integer portion of
* half of L to L0.
*/
int dbleCF (double i[], double j[])
{
double k,
l,
l0,
j0;
int n,
m;
n = 1;
m = 0;
j0 = i[0] + i[0];
l = i[n];
while (1)
{
if (l < 0)
{
j[m] = j0;
break;
};
modf (l / 2, &l0);
l = l - l0 - l0;
k = i[n + 1];
if (l0 > 0)
{
j[m] = j0;
j[m + 1] = l0;
j0 = 0;
m = m + 2;
};
if (l == 0) {
/*
* Even case.
*/
if (k < 0)
{
m = m - 1;
break;
}
else
{
j0 = j0 + k + k;
n = n + 2;
l = i[n];
continue;
};
}
/*
* Odd case.
*/
if (k < 0)
{
j[m] = j0 + 2;
break;
};
if (k == 0)
{
n = n + 2;
l = l + i[n];
continue;
};
j[m] = j0 + 1;
m = m + 1;
j0 = 1;
l = k - 1;
n = n + 1;
continue;
};
m = m + 1;
j[m] = -99999;
return (m);
}
/*
* Subroutine input computes the continued fraction for
* (2/e) / (pi/2) , where e = 2 ^ (D-1) , given pi 's
* continued fraction as input. That is, double the continued
* fraction of pi D-3 times and place a zero at the front.
*
* One should use as many partial coefficients of pi as
* necessary to resolve numbers of the width of the underflow
* plus the overflow threshold. A rule of thumb is 0.97
* partial coefficients are generated for every decimal digit
* of pi . The last coefficient of pi is terminated by a
* negative number.
*
* I'll be happy to supply anyone with the partial coefficients
* of pi . My ARPA address is mcdonald@ucbdali.BERKELEY.ARPA .
*
* I computed the partial coefficients of pi using a method of
* Bill Gosper's. I need only compute with integers, albeit
* large ones. After writing the program in bc and Vaxima ,
* Prof. Fateman suggested FranzLisp . To my surprise, FranzLisp
* ran the fastest! the reason? FranzLisp's Bignum package is
* hand coded in assembler. Also, FranzLisp can be compiled.
*
*
* Note: for radix B machines, subroutine input should compute
* the continued fraction for (B/e)/(pi/2) where e = B ^ (D - 1).
* In the case of hexadecimal ( B = 16 ), this is done by repeated
* doubling the appropriate number of times.
*/
void input (double i[])
{
int k;
double j[MAX_k];
/*
* Read in the partial coefficients of pi from a precalculated file
* until a negative value is encountered.
*/
k = -1;
do
{
k = k + 1;
scanf ("%lE", &i[k]);
printf("hello\n");
printf("%d", k);
} while (i[k] >= 0);
/*
* Double the continued fraction for pi D-3 times using
* i and j alternately as source and destination. On my
* machine D = 56 so D-3 is odd; hence the following code:
*
* Double twice (D-3)/2 times,
*/
for (k = 1; k <= (D - 3) / 2; k = k + 1)
{
dbleCF (i, j);
dbleCF (j, i);
};
/*
* then double once more.
*/
dbleCF (i, j);
/*
* Now append a zero on the front (reciprocate the continued
* fraction) and the return the coefficients in i .
*/
i[0] = 0;
k = -1;
do
{
k = k + 1;
i[k + 1] = j[k];
} while (j[k] >= 0);
/*
* Return the length of the continued fraction, including its
* terminator and initial zero, in the global variable CFlength.
*/
CFlength = k;
}
/*
* Given a continued fraction's coefficients in an array i ,
* subroutine nearPiOver2 finds all machine representable
* values near a integer multiple of pi/2 in the current binade.
*/
void nearPiOver2 (double i[])
{
int k, /* subscript for recurrences (see
handout) */
K; /* like k , but used during cancel. elim.
*/
double p[MAX_k], /* product of the q's (see
handout) */
q[MAX_k], /* successive tail evals of CF (see
handout) */
j[MAX_k], /* like convergent numerators (see
handout) */
tmp, /* temporary used during cancellation
elim. */
mk0, /* m[k - 1] (see
handout) */
mk, /* m[k] is one of the few ints (see
handout) */
mkAbs, /* absolute value of m sub k
*/
mK0, /* like mk0 , but used during cancel.
elim. */
mK, /* like mk , but used during cancel.
elim. */
z, /* the object of our quest (the argument)
*/
m0, /* the mantissa of z as a D-bit integer
*/
x, /* the reduced argument (see
handout) */
ldexp (), /* sys routine to multiply by a power of
two */
fabs (), /* sys routine to compute FP absolute
value */
floor (), /* sys routine to compute greatest int <=
value */
ceil (); /* sys routine to compute least int >=
value */
/*
* Compute the q's by evaluating the continued fraction from
* bottom up.
*
* Start evaluation with a big number in the terminator position.
*/
q[CFlength] = 1.0 + 30;
for (k = CFlength - 1; k >= 0; k = k - 1)
q[k] = i[k] + 1 / q[k + 1];
/*
* Let THRESHOLD be the biggest | x | that we are interesed in
* seeing.
*
* Compute the p's and j's by the recurrences from the top down.
*
* Stop when
*
* 1 1
* ----- >= THRESHOLD > ------ .
* 2 |j | 2 |j |
* k k+1
*/
p[0] = 1;
j[0] = 0;
j[1] = 1;
k = 0;
do
{
p[k + 1] = -q[k + 1] * p[k];
if (k > 0)
j[1 + k] = j[k - 1] - i[k] * j[k];
k = k + 1;
} while (1 / (2 * fabs (j[k])) >= THRESHOLD);
/*
* Then mk runs through the integers between
*
* k + k +
* (-1) e / p - 1/2 & (-1) f / p - 1/2 .
* k k
*/
for (mkAbs = floor (e / fabs (p[k]));
mkAbs <= ceil (f / fabs (p[k])); mkAbs = mkAbs + 1)
{
mk = mkAbs * sgn (p[k]);
/*
* For each mk , mk0 runs through integers between
*
* +
* m q - p THRESHOLD .
* k k k
*/
for (mk0 = floor (mk * q[k] - fabs (p[k]) * THRESHOLD);
mk0 <= ceil (mk * q[k] + fabs (p[k]) * THRESHOLD);
mk0 = mk0 + 1)
{
/*
* For each pair { mk , mk0 } , check that
*
* k
* m = (-1) ( j m - j m )
* 0 k-1 k k k-1
*/
m0 = (k & 1 ? -1 : 1) * (j[k - 1] * mk - j[k] * mk0);
/*
* lies between e and f .
*/
if (e <= fabs (m0) && fabs (m0) <= f)
{
/*
* If so, then we have found an
*
* k
* x = ((-1) m / p - m ) / j
* 0 k k k
*
* = ( m q - m ) / p .
* k k k-1 k
*
* But this later formula can suffer cancellation. Therefore,
* run the recurrence for the mk 's to get mK with minimal
* | mK | + | mK0 | in the hope mK is 0 .
*/
K = k;
mK = mk;
mK0 = mk0;
while (fabs (mK) > 0)
{
p[K + 1] = -q[K + 1] * p[K];
tmp = mK0 - i[K] * mK;
if (fabs (tmp) > fabs (mK0))
break;
mK0 = mK;
mK = tmp;
K = K + 1;
};
/*
* Then
* x = ( m q - m ) / p
* K K K-1 K
*
* as accurately as one could hope.
*/
x = (mK * q[K] - mK0) / p[K];
/*
* To return z and m0 as positive numbers,
* x must take the sign of m0 .
*/
x = x * sgn (m0);
m0 = fabs (m0);
/*d
* Set z = m0 * 2 ^ (binade+1-D) .
*/
z = ldexp (m0, binade + 1 - D);
/*
* Print z (hex), z (dec), m0 (dec), binade+1-D, x (hex), x (dec).
*/
printf ("%08lx %08lx Z=%22.16E M=%17.17G L+1-%d=%3d %08lx %08lx x=%23.16E\n", hex (z), z, m0, D, binade + 1 - D, hex (x), x);
}
}
}
}
最佳答案
理论
首先让我们注意使用单精度算术的区别。
f
的最小值可以更大。由于 double 数是单精度数的超集,因此最接近的 single
到 2/pi
的倍数只能离得更远了~2.98e-19
,因此 f
的固定算术表示中的前导零的数量最多必须有 61 个前导零(但可能会更少)。表示此数量 fdigits
. y
必须准确到 fdigits
+ 24(单精度非零有效位)+ 7(额外保护位)= fdigits
+ 31,最多 92。x
的指数的宽度,2/pi
必须包含 127(single
的最大指数)+ 31 + fdigits
,或 158 + 79104| 最多为 156 |位。fdigits
的大小由 A
中的零数决定在二进制小数点之前(并且不受移动到 x
的影响),而大小 single
由 9 之前的公式确定。C
( x
>=2^24), x
看起来像这样:[24 位,M 个零]。乘以 x
,其大小是第一个 A
M
的位, 将产生一个整数(2/pi
的零只会将所有内容都转换为整数)。 x
从C
开始一点点 M+d
将导致产品 2/pi
至多大小 x*C
.在 double 中,d-24
被选择为 174(而不是 24,我们有 53),因此产品的大小最多为 121。在 d
中,选择single
就够了使得 d
,或更准确地说,d-24 <= 92
.即,d-24 <= fdigits+31
可以选择为d
+55,或最多 116。fdigits
大小最多为 116 位。 因此,我们面临两个问题:
B
.这包括从链接的论文中阅读引用文献 6 并理解它。可能没那么容易。 :) 据我所知,这是唯一的地方 fdigits
用来。 nearpi.c
,B
的相关位.自 2/pi
低于 127,我们可以计算 M
的前 127+116 位离线并将它们存储在一个数组中。见 Wikipedia . 2/pi
.这涉及乘法y=x*B
由 116 位数字组成。这就是使用第 3 节的地方。块的大小选择为 24,因为 2*24 + 2(将两个 24 位数字相乘,并添加 3 个这样的数字)小于 x
的精度, 53(因为 24 除以 96)。我们可以将大小为 11 位的块用于 double
出于类似原因的算术。 注意 -
single
的技巧仅适用于指数为正数 (x>=2^24) 的数字。总结一下 - 首先,你必须用
B
解决问题精确。您的 double
代码在 Matlab
中不起作用精度也是如此(尝试删除 double
并计算 single
,因为您的 sin(2^53)
只有 53 个有效位,而不是 175 个(无论如何,您不能在 Matlab 中直接乘以如此精确的数字)。其次,该方案应该是适应与 twooverpi
一起工作,再次,关键问题是足够精确地表示 single
,并支持高精度数字的乘法。最后,当一切正常时,您可以尝试找出更好的 2/pi
来减少您必须存储和相乘的位数。希望我没有完全离开 - 欢迎评论和矛盾。
例子
例如,让我们计算
fdigits
哪里sin(x)
, 在有效位之后没有零 (x = single(2^24-1)
= 0)。这简化了查找 M
, 如 B
由 B
的前 116 位组成.自 2/pi
具有 24 位精度和 x
116 位,产品y = x * B
根据需要将具有 92 位精度。
链接论文中的第 3 节描述了如何以足够的精度执行该产品;相同的算法可用于大小为 11 的块来计算
B
在我们的情况下。苦差事,我希望原谅我没有明确这样做,而是依靠y
的符号数学工具箱。这个工具箱为我们提供了 Matlab
函数,它允许我们以十进制数字指定数字的精度。所以,vpa('2/pi', ceil(116*log10(2)))
将产生
vpa
的近似值至少 116 位精度。因为 2/pi
仅接受整数作为其精度参数,我们通常无法准确指定数字的二进制精度,因此我们使用次佳。以下代码计算
vpa
根据论文,在 sin(x)
精确 :x = single(2^24-1);
y = x * vpa('2/pi', ceil(116*log10(2))); % Precision = 103.075
k = round(y);
f = single(y - k);
r = f * single(pi) / 2;
switch mod(k, 4)
case 0
s = sin(r);
case 1
s = cos(r);
case 2
s = -sin(r);
case 3
s = -cos(r);
end
sin(x) - s % Expected value: exactly zero.
(
single
的精度是使用 y
获得的,结果证明它是比 Mathematica
更好的数值工具:))在
Matlab
这个问题的另一个答案(已被删除)引导我在
libm
中实现,虽然适用于 double 数字,但非常彻底地遵循链接的论文。查看文件 s_sin.c对于包装器(链接论文中的表 2 在文件末尾显示为
libm
语句)和 e_rem_pio2.c对于参数缩减代码(特别感兴趣的是包含 switch
的前 396 个十六进制数字的数组,从第 69 行开始)。
关于c++ - 范围缩减 单精度浮点精度差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9423516/