我最近一直在修补电源模拟,我有以下代码:
library(MASS)
library(Matrix)
simdat <- data.frame(mmm = rep(rep(factor(1:2,
labels=c("m1", "m2")),
each = 2),
times = 2800),
ttt = rep(factor(1:2,
labels = c("t1", "t2")),
times = 5600),
sss = rep(factor(1:70),
each = 160),
iii = rep(rep(factor(1:40),
each = 4),
times = 70))
beta <- c(1, 2)
X1 <- model.matrix(~ mmm,
data = simdat)
Z1 <- model.matrix(~ ttt,
data = simdat)
X1
和 Z1
是 11200x2
矩阵。在 Stackoverflow 的帮助下,我设法使我的计算比以前更高效:funab <- function(){
ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
Mb <- as.vector(X1 %*% beta)
M1 <- rowSums(Z1 * ran_sub[rep(1:70,
each = 160),])
M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
times = 70),])
Mout <- Mb + M1 + M2
Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}
Y
那么将是一个长度为 11200
的向量.然后我多次复制这个函数(比如 1000
次):sim <- replicate(n = 1000,
expr = funab()},
simplify = FALSE)
sim
将是 11200x1000
列表。鉴于我想更多地这样做,并且可能在 funab()
中包含更多代码我想知道是否建议对 X1
使用稀疏矩阵和 Z1
在 funab()
的计算中像现在这样?
最佳答案
好的,我已经尝试按照我的问题的评论中给出的建议进行测试,并使用 microbenchmark
进行了测试。包裹。为了使复制和粘贴更容易,我将重复上面的代码:
library(MASS)
library(Matrix)
simdat <- data.frame(mmm = rep(rep(factor(1:2,
labels=c("m1", "m2")),
each = 2),
times = 2800),
ttt = rep(factor(1:2,
labels = c("t1", "t2")),
times = 5600),
sss = rep(factor(1:70),
each = 160),
iii = rep(rep(factor(1:40),
each = 4),
times = 70))
beta <- c(1, 2)
X1 <- model.matrix(~ mmm,
data = simdat)
Z1 <- model.matrix(~ ttt,
data = simdat)
我现在创建与稀疏矩阵相同的矩阵:
sparseX1 <- sparse.model.matrix(~ mmm,
data = simdat)
sparseZ1 <- sparse.model.matrix(~ ttt,
data = simdat)
然后我设置了两个函数:
funab_sparse <- function(){
ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
Mb <- as.vector(sparseX1 %*% beta)
M1 <- Matrix::rowSums(sparseZ1 * ran_sub[rep(1:70,
each = 160),])
M2 <- Matrix::rowSums(sparseZ1 * ran_ite[rep(rep(1:40, each = 4),
times = 70),])
Mout <- Mb + M1 + M2
Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}
funab <- function(){
ran_sub <- mvrnorm(70, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
ran_ite <- mvrnorm(40, mu = c(0,0), Sigma = matrix(c(10, 3, 3, 2), ncol = 2))
Mb <- as.vector(X1 %*% beta)
M1 <- rowSums(Z1 * ran_sub[rep(1:70,
each = 160),])
M2 <- rowSums(Z1 * ran_ite[rep(rep(1:40, each = 4),
times = 70),])
Mout <- Mb + M1 + M2
Y <- as.vector(Mout) + rnorm(length(Mout), mean = 0 , sd = 0.27)
}
library(microbenchmark)
res <- microbenchmark(funab(), funab_sparse(), times = 1000)
并得到结果:
> res <- microbenchmark(funab(), funab_sparse(), times = 1000)
> res
Unit: milliseconds
expr min lq median uq max neval
funab() 2.200342 2.277006 2.309587 2.481627 69.99895 1000
funab_sparse() 8.419564 8.568157 9.666248 9.874024 75.88907 1000
假设我没有犯任何重大错误,我可以得出结论,使用这种使用稀疏矩阵进行计算的特殊方法不会加速我的代码。
关于r - 什么时候建议在 R 中使用稀疏矩阵?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24672606/