c++ - 在 Rcpp 中为具有重复迭代的模型正确设置 GSL RNG 种子

我正在编写一个随机的、过程驱动的感染传播模型和诊断测试来检测感染。该模型需要在多个时间步长和迭代中重复随机样本。我的模型运行得越快越好。对于模型中的随机采样，随机样本的参数可以在模型中的每个时间步发生变化。我首先用 R 编写模型，然后用 CPP(通过很棒的 Rcpp 包)编写。在 Rcpp 中，使用基于 R 的随机数生成器，该模型的运行时间大约是 R 中的 7%。有人建议我在 CPP 中使用 GSL 来生成随机数，速度又更快了。在 CPP 模型中，使用基于 GSL 的随机采样而不是基于 R 的随机采样，速度略有提高。但是，我不确定我是否正确使用了基于 GSL 的随机采样器。

我的问题是:

仅根据一天中的时间为 GSL RNG 执行一次种子设置程序并为我的所有随机抽奖使用相同的构造(正如我在下面的代码中所做的那样)是否正确？我承认我并不完全理解 CPP 中 GSL 的种子设置程序，因为我对这两者都是新手。我比较了使用基于 R 和基于 GSL 的 RNG 生成的分布，它们非常相似，所以希望这一点没问题。

我从这个 Stack Overflow 帖子中获得了根据一天中的时间设置 GSL 种子的代码:

GSL Uniform Random Number Generator

我期望使用 GSL RNG 能够大幅提高速度。我可以做些什么来最大限度地提高 GSL RNG 的速度吗？

我使用的是 Windows 计算机和 RStudio 界面。我使用 Rcpp 包从 R 获取 CPP 函数。所有软件包和程序最近都已重新安装。这是 session 信息: R版本4.2.2(2022-10-31 ucrt) 平台:x86_64-w64-mingw32/x64(64位) 运行环境:Windows 10 x64(内部版本 22000)

就上下文而言，我是一名具有 R 经验的 vert 流行病学家，但仅学习 CPP 两个月。这是我的第一个堆栈交换查询。预先感谢您的宝贵时间!

这是我试图用 CPP(在 RStudio 中使用 Rcpp)编写并使用基于 GSL 的 RNG 实现的示例。请有人告诉我这是否是设置 GSL RNG 种子的正确方法？在函数顶部只执行一次种子设置过程可以吗？

// CPP code - function GSL RNG written using Rcpp on a CPP file in RStudio

// [[Rcpp::plugins(cpp11)]]

#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_blas.h>
#include <iostream>
#include <gsl/gsl_math.h>
#include <sys/time.h>
#include <RcppGSL.h>

// [[Rcpp::depends(RcppGSL)]]

// [[Rcpp::export]]


Rcpp:: NumericMatrix check_cpp_gsl_rng(int n_iters,
                                 int min_unif,
                                 int max_unif,
                                 double exp_rate,
                                 double bernoulli_prob) 
{
  const gsl_rng_type * T;
  gsl_rng * r;
  gsl_rng_env_setup();
  struct timeval tv; // Seed generation based on time
  gettimeofday(&tv,0);
  unsigned long mySeed = tv.tv_sec + tv.tv_usec;
  T = gsl_rng_default; // Generator setup
  r = gsl_rng_alloc (T);
  gsl_rng_set(r, mySeed);

  // matrix to collect outputs
  
  Rcpp:: NumericMatrix Output_Mat(n_iters, 7); 
  
  for (int i = 0; i < n_iters; i++) // in real model, parameters may change for each iteration
    
  {
    // random exponential draws
    
    Output_Mat(i, 0) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 1
    Output_Mat(i, 1) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 2
    
    // random uniform draws
    
    Output_Mat(i, 2) = gsl_ran_flat(r, min_unif, max_unif); // unif 1
    Output_Mat(i, 3) = gsl_ran_flat(r, min_unif, max_unif); // unif 2
    
    // random Bernoulli draws

    Output_Mat(i, 4) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 1
    Output_Mat(i, 5) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 2
    
    
    Output_Mat(i, 6) = i; // record iteration number
  }
  
  return Output_Mat;
  
  gsl_rng_free(r);
  
  // end of function
  
}

下图显示了仅在 R 中实现的随机采样函数、使用 R RNG 的 CPP 和使用 GSL RNG(如上面的代码)的 CPP 的运行速度比较，基于使用“微基准测试”的 1000 次迭代的 100 次比较“包。

最佳答案

您可能会发现有用的包是我的 RcppZiggurat ( github )。它恢复了旧的但快速的 Ziggurat RNG，用于正常协变量并对其进行计时。它使用其他几个 Ziggurat 实现作为基准 - 包括来自 GSL 的一个。

首先，我们可以使用它的代码和基础设施来建立一个简单的结构(见下文)。我首先表明“确实是”我们可以播种 GSL RNG:

> setseedGSL(42)
> rnormGSLZig(5)
[1] -0.811264  1.092556 -1.873074 -0.146400 -1.653703
> rnormGSLZig(5)    # different
[1] -1.281593  0.893496 -0.545510 -0.337940 -1.258800
> setseedGSL(42)
> rnormGSLZig(5)    # as before
[1] -0.811264  1.092556 -1.873074 -0.146400 -1.653703
>

请注意，我们需要一个 GSL RNG“状态”实例的全局变量。

其次，我们可以证明 Rcpp 实际上比标准普通 GSL 生成器或其 Ziggurat 实现更快。使用 Rcpp 矢量化速度更快:

> library(microbenchmark)
> n <- 1e5
> res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
> res
Unit: microseconds
             expr     min        lq     mean   median       uq      max neval cld
   rnormGSLZig(n) 996.580 1151.7065 1768.500 1355.053 1424.220 18597.82   100   b
 rnormGSLPlain(n) 996.316 1085.6820 1392.323 1358.696 1431.715  2929.05   100   b
      rcppLoop(n) 223.221  259.2395  641.715  518.706  573.899 13779.20   100  a 
    rcppDirect(n)  46.224   67.2075  384.004  293.499  320.919 14883.86   100  a 
>

代码如下；这是我的 RcppZiggurat 包的快速改编。您可以 sourceCpp() 它(如果您安装了 RcppGSL，我用它来“轻松”获取 GSL 的编译和链接指令)，它将运行上面显示的演示代码。

#include <Rcpp/Lighter>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

// [[Rcpp::depends(RcppGSL)]]

class ZigguratGSL {
public:
    ZigguratGSL(uint32_t seed=12345678) {
        gsl_rng_env_setup() ;
        r = gsl_rng_alloc (gsl_rng_default);
        gsl_rng_set(r, seed);
    }
    ~ZigguratGSL() {
        gsl_rng_free(r);
    }
    double normZig() {
        const double sigma=1.0;
        return gsl_ran_gaussian_ziggurat(r, sigma);
    }
    double normPlain() {
        const double sigma=1.0;
        return gsl_ran_gaussian_ziggurat(r, sigma);
    }
    void setSeed(const uint32_t seed) {
        gsl_rng_set(r, seed);
    }
private:
    gsl_rng *r;
};

static ZigguratGSL gsl;

// [[Rcpp::export]]
void setseedGSL(const uint32_t s) {
    gsl.setSeed(s);
    return;
}

// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLZig(int n) {
    Rcpp::NumericVector x(n);
    for (int i=0; i<n; i++) {
        x[i] = gsl.normZig();
    }
    return x;
}

// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLPlain(int n) {
    Rcpp::NumericVector x(n);
    for (int i=0; i<n; i++) {
        x[i] = gsl.normPlain();
    }
    return x;
}

// [[Rcpp::export]]
Rcpp::NumericVector rcppLoop(int n) {
    Rcpp::NumericVector x(n);
    for (int i=0; i<n; i++) {
        x[i] = R::rnorm(1.0,0.0);
    }
    return x;
}

// [[Rcpp::export]]
Rcpp::NumericVector rcppDirect(int n) {
    return Rcpp::rnorm(n, 1.0, 0.0);
}


/*** R
setseedGSL(42)
rnormGSLZig(5)
rnormGSLZig(5)    # different
setseedGSL(42)
rnormGSLZig(5)    # as before


library(microbenchmark)
n <- 1e5
res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
res
*/

PS 我们将其写为Rcpp。大写 R，小写 cpp。

关于c++ - 在 Rcpp 中为具有重复迭代的模型正确设置 GSL RNG 种子，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75322283/

c++ - 在 Rcpp 中为具有重复迭代的模型正确设置 GSL RNG 种子

上一篇：http - 如何从 Rust 发送超过 65535 个符号的 GET 请求？

下一篇：r - 使用类似于 xpath 的方式访问 R 中列表的元素