r - R中的条件随机样本

我想知道解决这个问题的最佳方法是什么。本质上，我想生成 20 个样本，这些样本加起来是 100，但也是 (x1+x2>20)。我正在努力获得快速高效的东西。我意识到我可以过滤掉不符合此条件的行，但如果我生成 10,000 行而不是 20 行，效率不高。

代码如下:

n = 20
x1 = sample(0:100,n,replace = TRUE)
x2 = sample(0:100,n,replace = TRUE)
x3 = sample(0:100,n,replace = TRUE)
index = (x1+x2+x3)>100
G=(x1+x2)>20
while(sum(index)>0&&sum(G)>0){
   x1[index&&G] = sample(0:100,n,replace = TRUE)
   x2[index&&G] = sample(0:100,n,replace = TRUE)
   x3[index&&G] = sample(0:100,n,replace = TRUE)
index =(x1+x2+x3)>100
G=(x1+x2)>20
}
x4=rep(100,n)-x1-x2-x3

df <- data.frame(x1,x2,x3,x4)

提前致谢。

最佳答案

另一种可能性: 选择序列 0:100 的三个中断。然后在这些中断之间生成 x1、x2、x3 和 x4。如果 x1 + x2 小于 20，则 x3 + x4 大于 20，因此我们可以交换它们。

generate_four_numbers <- function(from = 0, to = 100) {
    breaks <- sort(sample(seq(from, to), 3 ,replace = TRUE))
    x1 <- breaks[1]
    x2 <- breaks[2] - breaks[1]
    x3 <- breaks[3] - breaks[2]
    x4 <- 100 - breaks[3]

    if (x1 + x2 <= 20) {
        return(data.frame(x1 = x4, x2 = x3, x3 = x2, x4 = x1)
    }

    data.frame(x1, x2, x3, x4)
}

res <- do.call(rbind, lapply(1:10000, function(x) generate_four_numbers()))

table(rowSums(res)) # all at 100

length(which(res$x1 + res$x2 > 20)) / nrow(res) # 100 % acceptable

关于r - R中的条件随机样本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51440993/

r - R中的条件随机样本

上一篇：django - 如何使用 FileField 将编辑后的内容正确保存到文件中？

下一篇：azure - 如何使用curl设置Azure KeyVault secret 的值