floating-point - "banker' s舍入“真的在数值上更稳定吗？

银行家的四舍五入我的意思是

“四舍五入，取偶数”

Rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit. This is the default for binary floating-point and the recommended default for decimal.

据说这种方法优于

“四舍五入，远离零”

on the grounds that它“在对四舍五入的数字求和时最小化预期误差”。显然这是because “在最合理的分布中，它不会像远离零的圆形一半方法那样受到负或正偏差的影响”。

我不明白为什么会这样。直觉上，如果 0.0向零四舍五入，0.5 “应该”从零四舍五入(如方法 2 中所示)。这样，相同数量的数字将向零和远离零四舍五入。简单来说，如果浮点数用 1 个十进制数字表示，则在十个数字中 0.0 , ..., 0.9五将向下舍入，五将使用方法 2 向上舍入。对于 1.0 类似。 , ..., 1.9等等。

当然浮点数用二进制尾数表示，但我认为上述推理仍然适用。请注意，对于 IEEE 754 double ，整数和整数加一半都可以精确表示绝对值最大为 2^52大约，因此这些确切的值实际上会在实践中出现。

那么方法1如何更好呢？

最佳答案

是的!它确实在数值上更稳定。

对于您正在查看的案例，数字 [0.0, 0.1, ..., 0.9] ，请注意，在舍入关系下，这些数字中只有四个向下舍入( 0.1 到 0.4 )，五个向上舍入，一个( 0.0 )通过舍入操作保持不变，并且那么当然该模式重复 1.0通过 1.9 , 2.0通过 2.9等。因此，平均而言，更多的值从零四舍五入而不是向零四舍五入。但在平局下，我们会得到:

在 [0.0, 0.9] 中，五个值向下舍入，四个值向上舍入

在 [1.0, 1.9] 中，四个值向下舍入，五个值向上舍入

等等。平均而言，我们得到的向上舍入和向下舍入的值数量相同。更重要的是，舍入引入的预期误差(在对输入分布的适当假设下)接近于零。

这是使用 Python 的快速演示。避免由于 Python 2/Python 3 的内置差异导致的困难 round函数，我们给出了两个与 Python 版本无关的舍入函数:

def round_ties_to_even(x):
    """
    Round a float x to the nearest integer, rounding ties to even.
    """
    if x < 0:
        return -round_ties_to_even(-x)  # use symmetry
    int_part, frac_part = divmod(x, 1)
    return int(int_part) + (
        frac_part > 0.5
        or (frac_part == 0.5 and int_part % 2.0 == 1.0))

def round_ties_away_from_zero(x):
    """
    Round a float x to the nearest integer, rounding ties away from zero.
    """
    if x < 0:
        return -round_ties_away_from_zero(-x)  # use symmetry
    int_part, frac_part = divmod(x, 1)
    return int(int_part) + (frac_part >= 0.5)

现在我们看一下通过将这两个函数应用于范围 [50.0, 100.0] 中的一位小数点后十进制值而引入的平均误差。 :

>>> test_values = [n / 10.0 for n in range(500, 1001)]
>>> errors_even = [round_ties_to_even(value) - value for value in test_values]
>>> errors_away = [round_ties_away_from_zero(value) - value for value in test_values]

我们使用最近添加的 statistics计算这些误差的均值和标准差的标准库模块:

>>> import statistics
>>> statistics.mean(errors_even), statistics.stdev(errors_even)
(0.0, 0.2915475947422656)
>>> statistics.mean(errors_away), statistics.stdev(errors_away)
(0.0499001996007984, 0.28723681870533313)

这里的关键是errors_even均值为零:平均误差为零。但是errors_away具有正均值:平均误差偏离零。

一个更现实的例子

这是一个半现实的例子，它展示了数值算法中远离零的圆形关系的偏差。我们将使用 pairwise summation 计算浮点数列表的总和。算法。该算法将要计算的和分成大致相等的两个部分，递归地将这两个部分相加，然后将结果相加。它比简单求和要准确得多，但通常不如 Kahan summation 等更复杂的算法好.这是 NumPy 使用的算法 sum功能。这是一个简单的 Python 实现。

import operator

def pairwise_sum(xs, i, j, add=operator.add):
    """
    Return the sum of floats xs[i:j] (0 <= i <= j <= len(xs)),
    using pairwise summation.
    """
    count = j - i
    if count >= 2:
        k = (i + j) // 2
        return add(pairwise_sum(xs, i, k, add),
                   pairwise_sum(xs, k, j, add))
    elif count == 1:
        return xs[i]
    else:  # count == 0
        return 0.0

我们包含了一个参数 add到上面的函数，表示要用于加法的操作。默认情况下，它使用 Python 的普通加法算法，该算法在典型机器上将解析为标准 IEEE 754 加法，使用舍入到偶数舍入模式。

我们想查看 pairwise_sum 的预期错误函数，使用标准加法和使用圆关系远离零版本的加法。我们的第一个问题是我们没有一种简单且可移植的方法来从 Python 内部更改硬件的舍入模式，并且二进制浮点的软件实现会很大而且很慢。幸运的是，在仍然使用硬件浮点数的同时，我们可以使用一个技巧来获得远离零的舍入关系。对于该技巧的第一部分，我们可以使用 Knuth 的“2Sum”算法将两个浮点数相加并获得正确舍入的总和以及该总和中的确切误差:

def exact_add(a, b):
    """
    Add floats a and b, giving a correctly rounded sum and exact error.

    Mathematically, a + b is exactly equal to sum + error.
    """
    # This is Knuth's 2Sum algorithm. See section 4.3.2 of the Handbook
    # of Floating-Point Arithmetic for exposition and proof.
    sum = a + b
    bv = sum - a
    error = (a - (sum - bv)) + (b - bv)
    return sum, error

有了这个，我们可以很容易地使用误差项来确定确切的总和何时是平局。我们有平局当且仅当 error非零且 sum + 2*error是完全可以表示的，在这种情况下 sum和 sum + 2*error是离领带最近的两个花车。使用这个想法，这是一个将两个数字相加并给出正确舍入结果的函数，但舍入远离零。

def add_ties_away(a, b):
    """
    Return the sum of a and b. Ties are rounded away from zero.
    """
    sum, error = exact_add(a, b)
    sum2, error2 = exact_add(sum, 2.0*error)
    if error2 or not error:
        # Not a tie.
        return sum
    else:
        # Tie. Choose the larger of sum and sum2 in absolute value.
        return max([sum, sum2], key=abs)

现在我们可以比较结果。 sample_sum_errors是一个函数，它生成范围 [1, 2] 中的浮点数列表，使用正常的轮对偶数加法和我们自定义的轮对远离零版本将它们相加，并与精确总和进行比较并返回两个版本的错误，最后以单位为单位。

import fractions
import random

def sample_sum_errors(sample_size=1024):
    """
    Generate `sample_size` floats in the range [1.0, 2.0], sum
    using both addition methods, and return the two errors in ulps.
    """
    xs = [random.uniform(1.0, 2.0) for _ in range(sample_size)]
    to_even_sum = pairwise_sum(xs, 0, len(xs))
    to_away_sum = pairwise_sum(xs, 0, len(xs), add=add_ties_away)

    # Assuming IEEE 754, each value in xs becomes an integer when
    # scaled by 2**52; use this to compute an exact sum as a Fraction.
    common_denominator = 2**52
    exact_sum = fractions.Fraction(
        sum(int(m*common_denominator) for m in xs),
        common_denominator)

    # Result will be in [1024, 2048]; 1 ulp in this range is 2**-44.
    ulp = 2**-44
    to_even_error = (fractions.Fraction(to_even_sum) - exact_sum) / ulp
    to_away_error = (fractions.Fraction(to_away_sum) - exact_sum) / ulp

    return to_even_error, to_away_error

这是一个示例运行:

>>> sample_sum_errors()
(1.6015625, 9.6015625)

因此，使用标准加法的误差为 1.6 ulps，而在舍入远离零时的误差为 9.6 ulps。看起来似乎从零开始的联系方法更糟，但单次运行并不是特别令人信服。让我们这样做 10000 次，每次使用不同的随机样本，并绘制我们得到的错误。这是代码:

import statistics
import numpy as np
import matplotlib.pyplot as plt

def show_error_distributions():
    errors = [sample_sum_errors() for _ in range(10000)]
    to_even_errors, to_away_errors = zip(*errors)
    print("Errors from ties-to-even: "
          "mean {:.2f} ulps, stdev {:.2f} ulps".format(
              statistics.mean(to_even_errors),
              statistics.stdev(to_even_errors)))
    print("Errors from ties-away-from-zero: "
          "mean {:.2f} ulps, stdev {:.2f} ulps".format(
              statistics.mean(to_away_errors),
              statistics.stdev(to_away_errors)))

    ax1 = plt.subplot(2, 1, 1)
    plt.hist(to_even_errors, bins=np.arange(-7, 17, 0.5))
    ax2 = plt.subplot(2, 1, 2)
    plt.hist(to_away_errors, bins=np.arange(-7, 17, 0.5))
    ax1.set_title("Errors from ties-to-even (ulps)")
    ax2.set_title("Errors from ties-away-from-zero (ulps)")
    ax1.xaxis.set_visible(False)
    plt.show()

当我在我的机器上运行上述函数时，我看到:

Errors from ties-to-even: mean 0.00 ulps, stdev 1.81 ulps
Errors from ties-away-from-zero: mean 9.76 ulps, stdev 1.40 ulps

我得到以下情节:

histograms of errors from the two rounding methods

我计划更进一步，对两个样本的偏差进行统计测试，但从零开始的联系方法的偏差非常明显，看起来没有必要。有趣的是，虽然远离零的方法给出了较差的结果，但它确实给出了更小的错误传播。

关于floating-point - "banker' s舍入“真的在数值上更稳定吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45223778/

floating-point - "banker' s舍入“真的在数值上更稳定吗？

上一篇：silverlight - 有免费的 map 控件吗？

下一篇：interface-builder - AppCode 界面生成器