r - 模拟相关伯努利数据

我想用 5 列模拟 100 个数据。我想在列之间获得 0.5 的相关性。为了完成它，我做了以下操作

F1 <- matrix( c(1, .5, .5, .5,.5,
                   .5, 1, .5, .5,.5,
                   .5, .5, 1, .5,.5,
                   .5, .5, .5, 1,.5,
                   .5, .5, .5, .5,1
), 5,5)

为了模拟预期的数据框，我已经这样做了，但它不能正常工作。

 df2 <- as.data.frame (rbinom(100, 1,.5),ncol(5), F1)

最佳答案

我很惊讶这不是重复的(this question 特指非二元响应，即 N>1 的二项式)。 bindata package做你想做的。

library(bindata)
## set up correlation matrix (compound-symmetric with rho=0.5)
m <- matrix(0.5,5,5)
diag(m) <- 1

以 0.5 的平均值进行模拟(如您的示例):

set.seed(101)
## this simulates 10 rather than 100 realizations
## (I didn't read your question carefully enough)
## but it's easy to change
r <- rmvbin(n=10, margprob=rep(0.5,5), bincorr=m)
round(cor(r),2)

结果

 1.00 0.22  0.80  0.05 0.22
 0.22 1.00  0.00  0.65 1.00
 0.80 0.00  1.00 -0.09 0.00
 0.05 0.65 -0.09  1.00 0.65
 0.22 1.00  0.00  0.65 1.00

这看起来不对 - 相关性不完全是 0.5 - 但平均它们将是(当我采样 10,000 个向量而不是 10 个时，值的范围从大约 0.48 到 0.51)。等效地，如果您模拟了许多 10 个样本并为每个样本计算了相关矩阵，您应该会发现预期(平均)相关矩阵是正确的。

模拟具有完全等于指定值的相关性的值要困难得多(并且不一定是您想要做的，具体取决于应用程序)

请注意，对于哪些均值向量和相关矩阵是可行的，会有一些限制。例如，n×n 复合对称(等相关)矩阵的非对角元素不能小于 -1/(n-1)。同样，对于给定的一组均值可能存在哪些相关性可能存在限制(这可能在技术引用中讨论过，我还没有检查过)。

这种方法的引用是

Leisch, Friedrich and Weingessel, Andreas and Hornik, Kurt (1998) On the generation of correlated artificial binary data. Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science", 13. SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, Vienna. https://epub.wu.ac.at/286/

关于r - 模拟相关伯努利数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59595292/

r - 模拟相关伯努利数据

上一篇：.net - Linux 上的 ildasm 通过 nuget 安装 : ildasm executable not found

下一篇：r - 在绘图中的点 0 处添加 3d 曲面