r - 种子设置 : why is the output different after no change in input

设置种子可确保可重复性，这在模拟建模中很重要。考虑一个简单的模型 f()，其中包含两个感兴趣的变量 y1 和 y2。这些变量的输出由随机过程 (rbinom()) 和参数 x1 和 x2 确定。两个感兴趣变量的输出相互独立。

现在假设我们要比较相应参数发生变化后变量输出的变化与发生变化之前的场景(即敏感性分析)。如果没有更改所有其他参数并且设置了相同的种子，那么不受影响的变量的输出不应该保持与默认模拟中的相同，因为这个变量独立于其他变量吗？

简而言之，尽管种子不变，但仅在 x1 发生变化后，由参数 x2 确定的变量 y2 的以下输出为什么会发生变化被设置？可以忽略 y2 的输出，只关注 y1，但在更大的模拟中，每个变量都是总成本的成本组成部分，未受影响变量的变化在进行个别参数更改后测试模型的整体灵敏度时可能会出现问题。

#~ parameters and model

x1 <- 0.0
x2 <- 0.5
n  <- 10
ts <- 5

f <- function(){
  out <- data.frame(step = rep(0, n),
                    space = 1:n,
                    id = 1:n,
                    y1 = rep(1, n),
                    y2 = rep(0, n))
  
  l.out <- vector(mode = "list", length = n)
  
  for(i in 1:ts){
    out$step <- i
    out$y1[out$y1 == 0] <- 1
    out$id[out$y2 == 1]  <- seq_along(which(out$y2 == 1)) + n
    out$y2[out$y2 == 1] <- 0
    
    out$y1 <- rbinom(nrow(out), 1, 1-x1)
    out$y2 <- rbinom(nrow(out), 1, x2)
    
    n  <- max(out$id)
    l.out[[i]] <- out
  }
do.call(rbind, l.out)
}

#~ Simulation 1 (default)
set.seed(1)
run1 <- f()
set.seed(1)
run2 <- f()
run1 == run2 #~ all observations true as expected

#~ Simulation 2
#~ change in x1 parameter affecting only variable y1
x1 <- 0.25
set.seed(1)
run3 <- f()
set.seed(1)
run4 <- f()
run3 == run4 #~ all observations true as expected

#~ compare variables after change in x1 has occured
run1$y1 == run3$y1  #~ observations differ as expected
run1$y2 == run3$y2  #~ observations differ - why?

最佳答案

好问题。这种行为的原因是，当您在 rbinom 中设置 p = 0 或 p = 1 时，底层 C 函数意识到它没有需要使用随机数生成器进行采样。种子仅在调用随机数生成器时发生变化，因此如果 p 是严格介于 0 和 1 之间的任何数字，种子将发生变化，但如果 p 是 0 或 1 它不会。你可以看到这是 source code .

在正常情况下，当 p 大于零或小于一时，您的设置应该可以正常工作:

set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 0.1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] TRUE

但是如果你将p设置为1或者0，结果就会不同:

set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] FALSE

为了证明这是正确的，如果我们第一次将 p 设置为 1 并将 p 设置为 0，我们应该得到 y1 == y2第二次:

set.seed(1)
x1 <- rbinom(5, 1, 0)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] TRUE

关于r - 种子设置 : why is the output different after no change in input，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64928795/

r - 种子设置 : why is the output different after no change in input

上一篇：tensorflow2.0 - Tensorflow 2.0 : AttributeError: Tensor. 启用急切执行时名称无意义

下一篇：azure - 尝试在 azure 服务器上创建 sendgrid 帐户时出现问题