algorithm - 我可以确定性地对任意排列的 float 的向量求和吗?

标签 algorithm math floating-point deterministic

假设我有一个(可能很大)的浮点数向量,这些向量是由一些黑盒过程产生的。是否可以计算这些数字的按位可重现的总和?
如果黑盒过程总是以相同的顺序生成数字,那么按位可重现的求和很容易:只需从左到右求和即可。
但是,如果数字以随机顺序生成,也许是因为它们是从异步进程返回和收集的,那么就更难了:我必须对它们进行数字排序。
但是,如果有更多的数字,可能分布在不同的机器上,因此移动它们是不可取的怎么办?
还有一种方法可以确定地对它们求和吗?

最佳答案

概述:多 channel 数据变异方法
(请参阅 here 以获得更好的答案。)
是的,有办法。
Demmel 和 Nguyen (2013) 的论文“Fast Reproducible Floating-Point Summation”中描述了一种方法。我在下面包含了它的串行和并行实现。
这里我们将比较三种算法:

  • 传统的从左到右求和:对于输入顺序中的排列,这是快速、不准确且不可重现的。
  • Kahan summation :这更慢,非常准确(输入大小本质上是 O(1)),并且虽然在输入顺序的排列方面不可再现,但更接近狭义的再现性。
  • 确定性求和:这比 Kahan 求和要慢一些,可以非常准确,并且可以按位重现。

  • 传统的求和是不准确的,因为随着总和的增长,我们添加到它的数字的最低有效位会被悄悄删除。 Kahan 求和通过保持运行的“补偿”总和来解决这个问题,该总和保持在最低有效数字上。确定性求和使用类似的策略来保持准确性,但在执行求和之前,将输入数字“预舍入”到一个公共(public)基数,以便可以准确计算它们的总和而不会出现任何舍入错误。
    测试
    为了研究算法,我们将对从 [-1000, 1000] 范围内均匀抽取的 100 万个数字进行测试。为了展示算法的答案范围及其确定性,输入将被打乱并求和 100 次。并行算法使用 12 个线程运行。
    “正确”求和(通过在 long double 模式下使用 Kahan 求和并选择最常出现的值来确定)是
    310844.700699143717685046795
    
    我要断言的一个结果(下面的代码严格证明了这一点)是,对于所研究的每种数据类型,确定性算法对 100 个数据排序中的每一个都产生相同的结果,并且这些结果对于算法。
    各种算法的结果如下:
    Serial Float
    =========================================================
    Simple summation accumulation type   = float
    Average deterministic summation time = 0.00462s
    Average simple summation time        = 0.00144s (3.20x faster than deterministic)
    Average Kahan summation time         = 0.00290s (1.59x faster than deterministic)
    Deterministic value                  = 256430592.000000000 (very different from actual)
    Distinct Kahan values                = 3  (with single-precision accumulator)
    Distinct Simple values               = 93 (with single-precision accumulator)
    Distinct Simple values               = 1  (with double-precision accumulator)
    
    Parallel Float
    =========================================================
    Simple summation accumulation type   = float
    Average deterministic summation time = 0.00576s
    Average simple summation time        = 0.00206s (2.79x faster than deterministic)
    Average Kahan summation time         = 0.00599s (0.96x faster than deterministic)
    Deterministic value                  = 256430592.000000000 (very different from actual)
    Distinct Kahan values                = 3  (with single-precision accumulator)
    Distinct Simple values               = 89 (with single-precision accumulator)
    Distinct Simple values               = 1  (with double-precision accumulator)
    
    Serial Double
    =========================================================
    Average deterministic summation time = 0.00600s
    Average simple summation time        = 0.00171s (3.49x faster than deterministic)
    Average Kahan summation time         = 0.00378s (1.58x faster than deterministic)
    Deterministic value                  = 310844.70069914375199005 (epsilon difference from actual value)
    Distinct Kahan values                = 4
    Distinct Simple values               = 88
    
    Parallel Double
    =========================================================
    Average deterministic summation time = 0.01074s
    Average simple summation time        = 0.00289s (3.71x faster than deterministic)
    Average Kahan summation time         = 0.00648s (1.65x faster than deterministic)
    Deterministic value                  = 310844.70069914375199005 (epsilon difference from actual value)
    Distinct Kahan values                = 2
    Distinct Simple values               = 83
    
    Serial Long Double
    =========================================================
    Average deterministic summation time = 0.01072s
    Average simple summation time        = 0.00215s (4.96x faster than deterministic)
    Average Kahan summation time         = 0.00448s (2.39x faster than deterministic)
    Deterministic value                  = 310844.700699143717685046795 (no discernable difference from actual)
    Distinct Kahan values                = 3
    Distinct Simple values               = 94
    
    Parallel Long Double
    =========================================================
    Average deterministic summation time = 0.01854s
    Average simple summation time        = 0.00466s (3.97x faster than deterministic)
    Average Kahan summation time         = 0.00635s (2.91x faster than deterministic)
    Deterministic value                  = 310844.700699143717685046795 (no discernable difference from actual)
    Distinct Kahan values                = 1
    Distinct Simple values               = 82
    
    讨论
    所以我们学了什么?从单精度结果中我们看到,使用双倍长度累加器使得传统的求和算法对于这个数据集是可重现的,尽管对于任意数据集几乎肯定不会出现这种情况。尽管如此,这仍然是一种在工作时提高准确性的廉价方法。
    当传统求和使用的累加器与被添加的数字大小相同时,它的工作非常糟糕:我们运行了 100 次测试,得到了与传统求和几乎一样多的不同结果。
    另一方面,Kahan 求和产生的不同结果很少(因此它“更具”确定性)并且花费的时间大约是传统求和的两倍。
    确定性算法比简单求和需要大约 4 倍的时间,比 Kahan 求和长约 1.5-3 倍,但对于 double 和 long double 类型得到的答案大致相同,除了按位确定性(无论输入如何,它总是得到相同的答案)数据已排序)。
    然而,单精度浮点得到的答案非常糟糕,与单精度常规求和和 Kahan 求和不同,后者与实际答案非常接近。为什么是这样?
    我们一直在研究的论文确定,如果输入中有 n 个数字并且我们使用 k 轮折叠(论文推荐 2,这是我们在这里使用的),那么确定性和常规和将具有相似的误差范围提供
    n^k * e^(k-1) << 1
    
    其中 e 是数据类型的浮点 epsilon。这些 epsilon 值是:
    Single-precision epsilon      = 0.000000119
    Double-precision epsilon      = 0.00000000000000022
    Long double-precision epsilon = 0.000000000000000000108
    
    将这些与 n=1M 一起代入方程,我们得到:
    Single condition      = 119000
    Double condition      = 0.00022
    Long double condition = 0.000000108
    
    所以我们看到,对于这种大小的输入,预计单精度会表现不佳。
    另一点要注意的是,虽然 long double 在我的机器上占用 16 个字节,但这仅用于对齐目的:用于计算的真实长度仅为 80 位。因此,两个 long double 将在数值上相等地进行比较,但它们未使用的位是任意的。对于真正的按位再现性,需要确定未使用的位并将其设置为已知值。
    代码
    //Compile with:
    //g++ -g -O3 repro_vector.cpp -fopenmp
    
    //NOTE: Always comile with `-g`. It doesn't slow down your code, but does make
    //it debuggable and improves ease of profiling
    
    #include <algorithm>
    #include <bitset>               //Used for showing bitwise representations
    #include <cfenv>                //Used for setting floating-point rounding modes
    #include <chrono>               //Used for timing algorithms
    #include <climits>              //Used for showing bitwise representations
    #include <iostream>
    #include <omp.h>                //OpenMP
    #include <random>
    #include <stdexcept>
    #include <string>
    #include <typeinfo>
    #include <unordered_map>
    #include <vector>
    
    constexpr int ROUNDING_MODE = FE_UPWARD;
    constexpr int N = 1'000'000;
    constexpr int TESTS = 100;
    
    // Simple timer class for tracking cumulative run time of the different
    // algorithms
    struct Timer {
      double total = 0;
      std::chrono::high_resolution_clock::time_point start_time;
    
      Timer() = default;
    
      void start() {
        start_time = std::chrono::high_resolution_clock::now();
      }
    
      void stop() {
        const auto now = std::chrono::high_resolution_clock::now();
        const auto time_span = std::chrono::duration_cast<std::chrono::duration<double>>(now - start_time);
        total += time_span.count();
      }
    };
    
    //Simple class to enable directed rounding in floating-point math and to reset
    //the rounding mode afterwards, when it goes out of scope
    struct SetRoundingMode {
      const int old_rounding_mode;
    
      SetRoundingMode(const int mode) : old_rounding_mode(fegetround()) {
        if(std::fesetround(mode)!=0){
          throw std::runtime_error("Failed to set directed rounding mode!");
        }
      }
    
      ~SetRoundingMode(){
        if(std::fesetround(old_rounding_mode)!=0){
          throw std::runtime_error("Failed to reset rounding mode to original value!");
        }
      }
    
      static std::string get_rounding_mode_string() {
        switch (fegetround()) {
          case FE_DOWNWARD:   return "downward";
          case FE_TONEAREST:  return "to-nearest";
          case FE_TOWARDZERO: return "toward-zero";
          case FE_UPWARD:     return "upward";
          default:            return "unknown";
        }
      }
    };
    
    // Used to make showing bitwise representations somewhat more intuitive
    template<class T>
    struct binrep {
      const T val;
      binrep(const T val0) : val(val0) {}
    };
    
    // Display the bitwise representation
    template<class T>
    std::ostream& operator<<(std::ostream& out, const binrep<T> a){
      const char* beg = reinterpret_cast<const char*>(&a.val);
      const char *const end = beg + sizeof(a.val);
      while(beg != end){
        out << std::bitset<CHAR_BIT>(*beg++);
        if(beg < end)
          out << ' ';
      }
      return out;
    }
    
    
    
    //Simple serial summation algorithm with an accumulation type we can specify
    //to more fully explore its behaviour
    template<class FloatType, class SimpleAccumType>
    FloatType serial_simple_summation(const std::vector<FloatType> &vec){
      SimpleAccumType sum = 0;
      for(const auto &x: vec){
        sum += x;
      }
      return sum;
    }
    
    //Parallel variant of the simple summation algorithm above
    template<class FloatType, class SimpleAccumType>
    FloatType parallel_simple_summation(const std::vector<FloatType> &vec){
      SimpleAccumType sum = 0;
      #pragma omp parallel for default(none) reduction(+:sum) shared(vec)
      for(size_t i=0;i<vec.size();i++){
        sum += vec[i];
      }
      return sum;
    }
    
    
    
    //Kahan's compensated summation algorithm for accurately calculating sums of
    //many numbers with O(1) error
    template<class FloatType>
    FloatType serial_kahan_summation(const std::vector<FloatType> &vec){
      FloatType sum = 0.0f;
      FloatType c = 0.0f;
      for (const auto &num: vec) {
        const auto y = num - c;
        const auto t = sum + y;
        c = (t - sum) - y;
        sum = t;
      }
      return sum;
    }
    
    //Parallel version of Kahan's compensated summation algorithm (could be improved
    //by better accounting for the compsenation during the reduction phase)
    template<class FloatType>
    FloatType parallel_kahan_summation(const std::vector<FloatType> &vec){
      //Parallel phase
      std::vector<FloatType> sum(omp_get_max_threads(), 0);
      FloatType c = 0;
      #pragma omp parallel for default(none) firstprivate(c) shared(sum,vec)
      for (size_t i=0;i<vec.size();i++) {
        const auto tid = omp_get_thread_num();
        const auto y = vec[i] - c;
        const auto t = sum.at(tid) + y;
        c = (t - sum[tid]) - y;
        sum[tid] = t;
      }
    
      //Serial reduction phase
    
      //This could be more accurate if it took the remaining compensation values
      //from above into account
      FloatType total_sum = 0.0f;
      FloatType total_c = 0.0f;
      for(const auto &num: sum){
        const auto y = num - total_c;
        const auto t = total_sum + y;
        total_c = (t - total_sum) - y;
        total_sum = t;
      }
    
      return total_sum;
    }
    
    
    
    // Error-free vector transformation. Algorithm 4 from Demmel and Nguyen (2013)
    template<class FloatType>
    FloatType ExtractVectorNew2(
      const FloatType M,
      const typename std::vector<FloatType>::iterator &begin,
      const typename std::vector<FloatType>::iterator &end
    ){
      // Should use the directed rounding mode of the parent thread
    
      auto Mold = M;
      for(auto v=begin;v!=end;v++){
        auto Mnew = Mold + (*v);
        auto q = Mnew - Mold;
        (*v) -= q;
        Mold = Mnew;
      }
    
      //This is the exact sum of high order parts q_i
      //v is now the vector of low order parts r_i
      return Mold - M;
    }
    
    template<class FloatType>
    FloatType mf_from_deltaf(const FloatType delta_f){
      const int power = std::ceil(std::log2(delta_f));
      return static_cast<FloatType>(3.0) * std::pow(2, power);
    }
    
    //Implements the error bound discussed near Equation 6 of
    //Demmel and Nguyen (2013).
    template<class FloatType>
    bool is_error_bound_appropriate(const size_t N, const int k){
      const auto eps = std::numeric_limits<FloatType>::epsilon();
      const auto ratio = std::pow(N, k) * std::pow(eps, k-1);
      //If ratio << 1, then the conventional non-reproducible sum and the
      //deterministic sum will have error bounds of the same order. We arbitrarily
      //choose 1e-4 to represent this
      return ratio < 1e-3;
    }
    
    //Serial bitwise deterministic summation.
    //Algorithm 8 from Demmel and Nguyen (2013).
    template<class FloatType>
    FloatType serial_bitwise_deterministic_summation(
      std::vector<FloatType> vec,  // Note that we're making a copy!
      const int k
    ){
      constexpr FloatType eps = std::numeric_limits<FloatType>::epsilon();
      const auto n = vec.size();
      const auto adr = SetRoundingMode(ROUNDING_MODE);
    
      if(n==0){
        return 0;
      }
    
      if(!is_error_bound_appropriate<FloatType>(vec.size(), k)){
        std::cout<<"WARNING! Error bounds of deterministic sum are large relative to conventional summation!"<<std::endl;
      }
    
      FloatType m = std::abs(vec.front());
      for(const auto &x: vec){
        m = std::max(m, std::abs(x));
      }
    
      FloatType delta_f = n * m / (1 - 4 * (n + 1) * eps);
      FloatType Mf = mf_from_deltaf(delta_f);
    
      std::vector<FloatType> Tf(k);
      for(int f=0;f<k-1;f++){
        Tf[f] = ExtractVectorNew2<FloatType>(Mf, vec.begin(), vec.end());
        delta_f = n * (4 * eps * Mf / 3) / (1 - 4 * (n + 1) * eps);
        Mf = mf_from_deltaf(delta_f);
      }
    
      FloatType M = Mf;
      for(const FloatType &v: vec){
        M += v;
      }
      Tf[k-1] = M - Mf;
    
      FloatType T = 0;
      for(const FloatType &tf: Tf){
        T += tf;
      }
    
      return T;
    }
    
    //Parallel bitwise deterministic summation.
    //Algorithm 9 from Demmel and Nguyen (2013).
    template<class FloatType>
    FloatType parallel_bitwise_deterministic_summation(
      std::vector<FloatType> vec,  // Note that we're making a copy!
      const int k
    ){
      constexpr FloatType eps = std::numeric_limits<FloatType>::epsilon();
      const auto n = vec.size();
      const auto adr = SetRoundingMode(ROUNDING_MODE);
    
      if(n==0){
        return 0;
      }
    
      if(!is_error_bound_appropriate<FloatType>(vec.size(), k)){
        std::cout<<"WARNING! Error bounds of deterministic sum are large relative to conventional summation!"<<std::endl;
      }
    
      std::vector<FloatType> Tf(k);
    
      // Note that this reduction would require communication if we had
      // implemented this to work on multiple nodes
      FloatType m = std::abs(vec.front());
      #pragma omp parallel for default(none) reduction(max:m) shared(vec)
      for(size_t i=0;i<vec.size();i++){
        m = std::max(m, std::abs(vec[i]));
      }
    
      // Note that this reduction would require communication if we had
      // implemented this to work on multiple nodes
      #pragma omp declare reduction(vec_plus : std::vector<FloatType> : \
        std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<FloatType>())) \
        initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))
    
      #pragma omp parallel default(none) reduction(vec_plus:Tf) shared(k,m,n,vec,std::cout)
      {
        const auto adr = SetRoundingMode(ROUNDING_MODE);
        const auto threads = omp_get_num_threads();
        const auto tid = omp_get_thread_num();
        const auto values_per_thread = n / threads;
        const auto nlow = tid * values_per_thread;
        const auto nhigh = (tid<threads-1) ? ((tid+1) * values_per_thread) : n;
    
        FloatType delta_f = n * m / (1 - 4 * (n + 1) * eps);
        FloatType Mf = mf_from_deltaf(delta_f);
    
        for(int f=0;f<k-1;f++){
          Tf[f] = ExtractVectorNew2<FloatType>(Mf, vec.begin() + nlow, vec.begin() + nhigh);
          delta_f = n * (4 * eps * Mf / 3) / (1 - 4 * (n + 1) * eps);
          Mf = mf_from_deltaf(delta_f);
        }
    
        FloatType M = Mf;
        for(size_t i=nlow;i<nhigh;i++){
          M += vec[i];
        }
        Tf[k-1] = M - Mf;
      }
    
      FloatType T = 0;
      for(const FloatType &tf: Tf){
        T += tf;
      }
    
      return T;
    }
    
    
    
    //Convenience wrappers
    template<bool Parallel, class FloatType>
    FloatType bitwise_deterministic_summation(
      const std::vector<FloatType> &vec,  // Note that we're making a copy!
      const int k
    ){
      if(Parallel){
        return parallel_bitwise_deterministic_summation<FloatType>(vec, k);
      } else {
        return serial_bitwise_deterministic_summation<FloatType>(vec, k);
      }
    }
    
    template<bool Parallel, class FloatType, class SimpleAccumType>
    FloatType simple_summation(const std::vector<FloatType> &vec){
      if(Parallel){
        return parallel_simple_summation<FloatType, SimpleAccumType>(vec);
      } else {
        return serial_simple_summation<FloatType, SimpleAccumType>(vec);
      }
    }
    
    template<bool Parallel, class FloatType>
    FloatType kahan_summation(const std::vector<FloatType> &vec){
      if(Parallel){
        return serial_kahan_summation<FloatType>(vec);
      } else {
        return parallel_kahan_summation<FloatType>(vec);
      }
    }
    
    
    
    // Timing tests for the summation algorithms
    template<bool Parallel, class FloatType, class SimpleAccumType>
    FloatType PerformTestsOnData(
      const int TESTS,
      std::vector<FloatType> floats, //Make a copy so we use the same data for each test
      std::mt19937 gen               //Make a copy so we use the same data for each test
    ){
      Timer time_deterministic;
      Timer time_kahan;
      Timer time_simple;
    
      //Very precise output
      std::cout.precision(std::numeric_limits<FloatType>::max_digits10);
      std::cout<<std::fixed;
    
      std::cout<<"Parallel? "<<Parallel<<std::endl;
      if(Parallel){
        std::cout<<"Max threads = "<<omp_get_max_threads()<<std::endl;
      }
      std::cout<<"Floating type                        = "<<typeid(FloatType).name()<<std::endl;
      std::cout<<"Floating type epsilon                = "<<std::numeric_limits<FloatType>::epsilon()<<std::endl;
      std::cout<<"Simple summation accumulation type   = "<<typeid(SimpleAccumType).name()<<std::endl;
      std::cout<<"Number of tests                      = "<<TESTS<<std::endl;
      std::cout<<"Input sample = "<<std::endl;
      for(size_t i=0;i<10;i++){
        std::cout<<"\t"<<floats[i]<<std::endl;
      }
    
      //Get a reference value
      std::unordered_map<FloatType, uint32_t> simple_sums;
      std::unordered_map<FloatType, uint32_t> kahan_sums;
      const auto ref_val = bitwise_deterministic_summation<Parallel, FloatType>(floats, 2);
      for(int test=0;test<TESTS;test++){
        std::shuffle(floats.begin(), floats.end(), gen);
    
        time_deterministic.start();
        const auto my_val = bitwise_deterministic_summation<Parallel, FloatType>(floats, 2);
        time_deterministic.stop();
        if(ref_val!=my_val){
          std::cout<<"ERROR: UNEQUAL VALUES ON TEST #"<<test<<"!"<<std::endl;
          std::cout<<"Reference      = "<<ref_val                   <<std::endl;
          std::cout<<"Current        = "<<my_val                    <<std::endl;
          std::cout<<"Reference bits = "<<binrep<FloatType>(ref_val)<<std::endl;
          std::cout<<"Current   bits = "<<binrep<FloatType>(my_val) <<std::endl;
          throw std::runtime_error("Values were not equal!");
        }
    
        time_kahan.start();
        const auto kahan_sum = kahan_summation<Parallel, FloatType>(floats);
        kahan_sums[kahan_sum]++;
        time_kahan.stop();
    
        time_simple.start();
        const auto simple_sum = simple_summation<Parallel, FloatType, SimpleAccumType>(floats);
        simple_sums[simple_sum]++;
        time_simple.stop();
      }
    
      std::cout<<"Average deterministic summation time = "<<(time_deterministic.total/TESTS)<<std::endl;
      std::cout<<"Average simple summation time        = "<<(time_simple.total/TESTS)<<std::endl;
      std::cout<<"Average Kahan summation time         = "<<(time_kahan.total/TESTS)<<std::endl;
      std::cout<<"Ratio Deterministic to Simple        = "<<(time_deterministic.total/time_simple.total)<<std::endl;
      std::cout<<"Ratio Deterministic to Kahan         = "<<(time_deterministic.total/time_kahan.total)<<std::endl;
    
      std::cout<<"Reference value                      = "<<std::fixed<<ref_val<<std::endl;
      std::cout<<"Reference bits                       = "<<binrep<FloatType>(ref_val)<<std::endl;
    
      std::cout<<"Distinct Kahan values                = "<<kahan_sums.size()<<std::endl;
      std::cout<<"Distinct Simple values               = "<<simple_sums.size()<<std::endl;
    
      int count = 0;
      for(const auto &kv: kahan_sums){
        std::cout<<"\tKahan sum values (N="<<std::fixed<<kv.second<<") "<<kv.first<<" ("<<binrep<FloatType>(kv.first)<<")"<<std::endl;
        if(count++==10){
          break;
        }
      }
    
      count = 0;
      for(const auto &kv: simple_sums){
        std::cout<<"\tSimple sum values (N="<<std::fixed<<kv.second<<") "<<kv.first<<" ("<<binrep<FloatType>(kv.first)<<")"<<std::endl;
        if(count++==10){
          break;
        }
      }
    
      std::cout<<std::endl;
    
      return ref_val;
    }
    
    // Use this to make sure the tests are reproducible
    template<class FloatType, class SimpleAccumType>
    void PerformTests(
      const int TESTS,
      const std::vector<long double> &long_floats,
      std::mt19937 &gen
    ){
      std::vector<FloatType> floats(long_floats.begin(), long_floats.end());
    
      const auto serial_val = PerformTestsOnData<false, FloatType, SimpleAccumType>(TESTS, floats, gen);
      const auto parallel_val = PerformTestsOnData<true, FloatType, SimpleAccumType>(TESTS, floats, gen);
    
      //Note that the `long double` type may only use 12-16 bytes (to maintain
      //alignment), but only 80 bits, resulting in bitwise indeterminism in the last
      //few bits; however, the floating-point values themselves will be equal.
      std::cout<<"########################################"<<std::endl;
      std::cout<<"### Serial and Parallel values match for "
               <<typeid(FloatType).name()
               <<"? "
               <<(serial_val==parallel_val)
               <<std::endl;
      std::cout<<"########################################\n"<<std::endl;
    }
    
    
    
    int main(){
      std::random_device rd;
      // std::mt19937 gen(rd());   //Enable for randomness
      std::mt19937 gen(123456789); //Enable for reproducibility
      std::uniform_real_distribution<long double> distr(-1000, 1000);
      std::vector<long double> long_floats;
      for(int i=0;i<N;i++){
        long_floats.push_back(distr(gen));
      }
    
      PerformTests<double, double>(TESTS, long_floats, gen);
      PerformTests<long double, long double>(TESTS, long_floats, gen);
      PerformTests<float, float>(TESTS, long_floats, gen);
      PerformTests<float, double>(TESTS, long_floats, gen);
    
      return 0;
    }
    
    为 Eric Postpischil 编辑
    埃里克 说:

    A generator such as I described will produce numbers with largely the same quantum—all near multiples of the divisor. This may not include a wide variety of exponents in the samples, so it may not well model a distribution where the numbers are being rounded in the interior of the significand when being added. E.g., if we add many numbers of the form 123.45, they will add fine for a while, although rounding errors will grow as the partial sums accumulate. But if we add numbers of forms 12345, 123.45, and 1.2345, different errors occur earlier


    将 123.45 的 1M 值相加得到 123450000.000000002837623469532 的单个 long double Kahan 值。确定性 long double 值与 -0.00000000000727595761 不同,而确定性 double 值与 -0.00000001206353772047 不同,而确定性浮点数与 -3.68% 不同(鉴于其大 epsilon,正如预期的那样)。
    从集合 {1.2345, 12.345, 123.45, 1234.5, 12345} 中随机选择 1M 个值给出两个长双 Kahan 值:A=2749592287.563000000780448317528 (N=54) 和 B=2749592287.563000000547617673874 (N=46)。确定性 long double 值与 (A) 完全匹配;确定性 double 值与 (A) 的不同之处在于 -0.00000020139850676247 ;确定性浮点值通过 -257% 与 (A) 不同(再次,预期)。

    关于algorithm - 我可以确定性地对任意排列的 float 的向量求和吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67444249/

    相关文章:

    python 列出 bin 大小的平均值

    java - 将 float 格式化为 n 位小数

    c++ - 为什么此代码在 x64 发布和调试中运行不同?

    c++ - 如何从文件中读取哈夫曼树频率

    c# - 使用多线程暴力破解密码

    c++ - sin 和 cos 很慢,有替代方案吗?

    performance - 查找数字的最快方法是什么?

    JavaScript 添加问题

    javascript - javascript中如何显示小数点后20位以上?

    c++ - 浮点与浮点文字比较的奇怪输出