r - R 中网格上大型模拟的并行化

我正在网格上运行一系列大型模拟。我正在按行实现模拟，我发现我的采样函数是一个瓶颈。我尝试使用 foreach 和 doMC 库来加速该过程，但我发现并行方法速度较慢，或者我无法编写可由 foreach 正确解释的函数。

看看其他一些帖子，看来我使用 foreach 的方法可能会被误导，因为我尝试的作业数量大大超过了可用处理器的数量。我想知道人们是否会对如何在我的情况下最好地实现并行化提出一些建议。我的模拟通常有两种类型。在第一个中，我计算一个矩阵，其中包含我正在处理的网格行中每个元素的采样间隔(行)。然后，我使用 runif 进行采样(在实际模拟中，我的行包含约 9000 个单元格，并且我正在执行 10000 次模拟)。

#number of simulations per element 
n = 5

#Generate an example sampling interval.
m.int1 <- matrix ( seq ( 1, 20, 1 ), ncol=10, nrow=2 )

#Define a function to sample over the interval defined in m.int1
f.rand1 <- function(a) {
return ( runif ( n, a[1], a[2] ) )
}

#run the simulation with each columns corresponding to the row element and rows 
#the simultions.
sim1 <- round( apply ( m.int1, 2, f.rand1 ) )

在第二种情况下，我尝试从一组按矩阵中的列索引的经验分布中进行采样。 grid-row元素的值对应于要采样的列。

#number of simulations per element 
n = 5

#generate a vector represeting a row of grid values 
v.int2 <- round(runif(10,1,3))

#define matrix of data that contains the distributions to be sampled.
m.samples<-cbind(rep(5,10),rep(4,10),rep(3,10))  

f.sample <- function(a) {
return ( sample ( m.samples [ ,a], n, ) )
}

#Sample m.samples indexed by column number.
sim2<- sapply(v.int2,f.sample)

在第二个示例中，我能够利用 foreach() 和 %dopar% 并行运行，但模拟花费的时间比串行代码长得多。在上面的第一个示例中，我无法编写适当的函数来利用 foreach 并行化。我将在第二种情况下使用的代码只是为了演示我的想法 - 但我现在意识到我的方法的开销太大。

library(foreach)
library(doMC)
registerDoMC(2)

n = 5

#Sample m.samples indexed by column number using parallel method.
sim2.par <- foreach ( i = 1 : length ( v.int2 ), 
    .combine="cbind") %dopar% sample ( 
     m.samples [ , v.int2 [i] ] , n )

我很感激一些关于方法(和一些代码!)的建议，这将帮助我有效地利用并行化。同样，我正在处理的行通常包含大约 9000 个元素，并且我们对每个元素进行 10000 次模拟。所以我的输出模拟矩阵一般在10000 X 9000的量级。感谢您的帮助。

最佳答案

这是对您的第一次模拟的轻微改进。较大的 n 可能会在运行时产生更大的增益。

> n <- 1000
> m.int1 <- matrix ( seq ( 1, 20, 1 ), ncol=10, nrow=2 )
> f.rand1 <- function(a) {
+    return(runif(n, a[1], a[2]))
+ }
> system.time(x1 <- replicate(n, round(apply(m.int1, 2, f.rand1))))
   user  system elapsed 
   2.84    0.06    2.95 
> system.time(x2 <- replicate(n, matrix(round(runif(n*10, min = m.int1[1, ], max = m.int1[2, ])), ncol = 10, byrow = TRUE)))
   user  system elapsed 
   2.48    0.06    2.61 
> head(x1[,,1])
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    4    5    7   10   12   13   16   17    20
[2,]    1    3    6    7   10   11   13   16   17    19
[3,]    1    3    6    7   10   12   14   16   18    20
[4,]    2    4    5    7    9   12   14   16   17    19
[5,]    1    4    5    7   10   12   14   16   17    20
[6,]    1    4    6    8    9   11   13   15   18    20
> head(x2[,,1])
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    4    6    7    9   12   14   16   17    20
[2,]    1    3    6    8   10   12   14   15   18    20
[3,]    2    4    5    7    9   11   13   15   17    20
[4,]    2    3    5    7    9   11   14   15   17    19
[5,]    2    3    6    7    9   12   13   16   17    20
[6,]    2    4    6    7   10   12   14   16   17    20

关于r - R 中网格上大型模拟的并行化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14838816/

r - R 中网格上大型模拟的并行化

上一篇：google-apps-script - 单击按钮时 Google Apps 脚本将电子表格中单元格的值加 1

下一篇：ruby-on-rails - 为什么我需要重新加载这个子关联？