r - 如何使用Rcpp加速for循环?

标签 r performance for-loop rcpp

我已经创建了一个for循环,我想使用Rcpp库来加快速度。我对C ++不太熟悉。您能帮我加快我的功能吗?
谢谢您的帮助!

我已经将我的算法,代码以及输入和输出以及sessionInfo包括在内。

这是我的算法:

如果当前价格高于先前价格,请在名为TR的列中标记(+1)

如果当前价格低于先前价格,请在名为TR的列中标记(-1)

如果当前价格与之前的价格相同,
在名为TR的列中标记与先前价格相同的事物

这是我的代码:

price <- c(71.91, 71.82, 71.81, 71.81, 71.81, 71.82, 71.81, 71.81, 71.81, 
           71.82, 71.81, 71.81, 71.8, 71.81, 71.8, 71.81, 71.8, 71.8, 71.8, 
           71.8, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 
           71.81, 71.82, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.8, 
           71.8, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.81, 71.81, 
           71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.81, 71.81, 71.81, 71.82, 71.82, 
           71.81, 71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.81, 
           71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.81, 
           71.82, 71.82, 71.82, 71.82, 71.83, 71.82, 71.82, 71.82, 71.81, 
           71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 
           71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.83, 71.83, 71.83, 71.83, 
           71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 
           71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 
           71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 
           71.83)

TR <- numeric(length(price)-1)
TR <- c(NA,TR)

for (i in 1: (length(price)-1)){

  if (price[i] == price[i+1]) {TR[i+1] = TR[i]}

  if (price[i] < price[i+1]) {TR[i+1] = 1}

  if (price[i] > price[i+1]) {TR[i+1] = -1}

}


这是我的输出:dput(TR)yields

c(NA, -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1, -1, 1, -1, 1, 
  -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, 
  -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, 
  1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 
  1, 1, -1, 1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
  -1, 1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1)


这是我的sessionInfo:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.4

loaded via a namespace (and not attached):
[1] chron_2.3-45  plyr_1.8.1    Rcpp_0.11.1   reshape2_1.4  stringr_0.6.2 tools_3.1.2  

最佳答案

您可以直接翻译for循环:

library(Rcpp)
cppFunction(
"IntegerVector proc(NumericVector x) {
  const int n = x.size();
  IntegerVector y(n);
  y[0] = NA_INTEGER;
  for (int i=1; i < n; ++i) {
    if (x[i] == x[i-1]) y[i] = y[i-1];
    else if (x[i] > x[i-1]) y[i] = 1;
    else y[i] = -1;
  }
  return y;
}")


通常,与基R中的for循环相比,使用Rcpp可以得到相当大的加速:

proc.for <- function(price) {
  TR <- numeric(length(price)-1)
  TR <- c(NA,TR)
  for (i in 1: (length(price)-1)){
    if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
    if (price[i] < price[i+1]) {TR[i+1] = 1}
    if (price[i] > price[i+1]) {TR[i+1] = -1}
  }
  return(TR)
}
proc.aaron <- function(price) {
  change <- sign(diff(price))
  good <- change != 0
  goodval <- change[good]
  c(NA, goodval[cumsum(good)])
}
proc.jbaums <- function(price) {
  TR <- sign(diff(price))
  TR[TR==0] <- TR[which(TR != 0)][findInterval(which(TR == 0), which(TR != 0))]
  TR
}

all.equal(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# [1] TRUE
library(microbenchmark)
microbenchmark(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# Unit: microseconds
#                expr     min       lq      mean   median       uq      max neval
#         proc(price)   1.871   2.5380   3.92111   3.1110   4.5880   15.318   100
#     proc.for(price) 408.200 448.2830 542.19766 484.1265 546.3255 1821.104   100
#   proc.aaron(price)  23.916  25.5770  33.53259  31.5420  35.8575  190.372   100
#  proc.jbaums(price)  33.536  38.8995  46.80109  43.4510  49.3555  112.306   100


与for循环相比,我们看到的加速超过100倍,与所提供向量的矢量化替代相比,加速了10倍。

使用更大的向量(此处测试的长度为一百万),加速甚至更为重要:

price.big <- rep(price, times=5000)
all.equal(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# [1] TRUE
microbenchmark(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# Unit: milliseconds
#                    expr         min          lq        mean      median          uq        max neval
#         proc(price.big)    1.442119    1.818494    5.094274    2.020437    2.771903   56.54321   100
#     proc.for(price.big) 2639.819536 2699.493613 2949.962241 2781.636460 3062.277930 4472.35369   100
#   proc.aaron(price.big)   91.499940   99.859418  132.519296  140.521212  147.462259  207.72813   100
#  proc.jbaums(price.big)  117.242451  138.528214  170.989065  170.606048  180.337074  487.13615   100


现在,与for循环相比,我们的速度提高了1000倍,与矢量化R函数相比,速度提高了约70倍。即使只有这个大小,如果函数仅被调用一次,Rcpp相对矢量化R解决方案是否仍有很多优势尚不清楚,因为编译Rcpp代码肯定需要至少100 ms。如果这是一段您在分析中反复调用的代码,则提速将非常有吸引力。

关于r - 如何使用Rcpp加速for循环?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29223132/

相关文章:

python - Python中分组数据的累积自定义函数

r - 最新版本的 RSelenium 和 Firefox

java - 在 for 循环中使用 ArrayList.contains() 实际上比使用嵌套循环进行比较更有效吗?

java - 如何使用一组字符串过滤映射值?

javascript - 循环一些硬编码数据

r - 如何在 R 中的文件夹中打开并存储一系列图像?

r - 根据 R 中的其他因素处理日期

javascript - 什么更有效率?检查 == 或只是改变变量?

performance - 如何使用 LINQ 获得前两次出现?

python - pandas- 列中每个唯一字符串/组的新计算行