我想用 5 列模拟 100 个数据。我想在列之间获得 0.5 的相关性。为了完成它,我做了以下操作
F1 <- matrix( c(1, .5, .5, .5,.5,
.5, 1, .5, .5,.5,
.5, .5, 1, .5,.5,
.5, .5, .5, 1,.5,
.5, .5, .5, .5,1
), 5,5)
为了模拟预期的数据框,我已经这样做了,但它不能正常工作。
df2 <- as.data.frame (rbinom(100, 1,.5),ncol(5), F1)
最佳答案
我很惊讶这不是重复的(this question 特指非二元响应,即 N>1 的二项式)。 bindata package做你想做的。
library(bindata)
## set up correlation matrix (compound-symmetric with rho=0.5)
m <- matrix(0.5,5,5)
diag(m) <- 1
以 0.5 的平均值进行模拟(如您的示例):set.seed(101)
## this simulates 10 rather than 100 realizations
## (I didn't read your question carefully enough)
## but it's easy to change
r <- rmvbin(n=10, margprob=rep(0.5,5), bincorr=m)
round(cor(r),2)
结果 1.00 0.22 0.80 0.05 0.22
0.22 1.00 0.00 0.65 1.00
0.80 0.00 1.00 -0.09 0.00
0.05 0.65 -0.09 1.00 0.65
0.22 1.00 0.00 0.65 1.00
这种方法的引用是
Leisch, Friedrich and Weingessel, Andreas and Hornik, Kurt (1998) On the generation of correlated artificial binary data. Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science", 13. SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, Vienna. https://epub.wu.ac.at/286/
关于r - 模拟相关伯努利数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59595292/