我已经创建了一个for循环,我想使用Rcpp库来加快速度。我对C ++不太熟悉。您能帮我加快我的功能吗?
谢谢您的帮助!
我已经将我的算法,代码以及输入和输出以及sessionInfo包括在内。
这是我的算法:
如果当前价格高于先前价格,请在名为TR的列中标记(+1)
如果当前价格低于先前价格,请在名为TR的列中标记(-1)
如果当前价格与之前的价格相同,
在名为TR的列中标记与先前价格相同的事物
这是我的代码:
price <- c(71.91, 71.82, 71.81, 71.81, 71.81, 71.82, 71.81, 71.81, 71.81,
71.82, 71.81, 71.81, 71.8, 71.81, 71.8, 71.81, 71.8, 71.8, 71.8,
71.8, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81,
71.81, 71.82, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.8,
71.8, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.81, 71.81,
71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.81, 71.81, 71.81, 71.82, 71.82,
71.81, 71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.81,
71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.81,
71.82, 71.82, 71.82, 71.82, 71.83, 71.82, 71.82, 71.82, 71.81,
71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82,
71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.83, 71.83, 71.83, 71.83,
71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83,
71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83,
71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83,
71.83)
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
if (price[i] < price[i+1]) {TR[i+1] = 1}
if (price[i] > price[i+1]) {TR[i+1] = -1}
}
这是我的输出:dput(TR)yields
c(NA, -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1, -1, 1, -1, 1,
-1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1,
-1, -1, -1, -1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1,
1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1,
1, 1, -1, 1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, 1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
这是我的sessionInfo:
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.4
loaded via a namespace (and not attached):
[1] chron_2.3-45 plyr_1.8.1 Rcpp_0.11.1 reshape2_1.4 stringr_0.6.2 tools_3.1.2
最佳答案
您可以直接翻译for循环:
library(Rcpp)
cppFunction(
"IntegerVector proc(NumericVector x) {
const int n = x.size();
IntegerVector y(n);
y[0] = NA_INTEGER;
for (int i=1; i < n; ++i) {
if (x[i] == x[i-1]) y[i] = y[i-1];
else if (x[i] > x[i-1]) y[i] = 1;
else y[i] = -1;
}
return y;
}")
通常,与基R中的for循环相比,使用Rcpp可以得到相当大的加速:
proc.for <- function(price) {
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
if (price[i] < price[i+1]) {TR[i+1] = 1}
if (price[i] > price[i+1]) {TR[i+1] = -1}
}
return(TR)
}
proc.aaron <- function(price) {
change <- sign(diff(price))
good <- change != 0
goodval <- change[good]
c(NA, goodval[cumsum(good)])
}
proc.jbaums <- function(price) {
TR <- sign(diff(price))
TR[TR==0] <- TR[which(TR != 0)][findInterval(which(TR == 0), which(TR != 0))]
TR
}
all.equal(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# [1] TRUE
library(microbenchmark)
microbenchmark(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# Unit: microseconds
# expr min lq mean median uq max neval
# proc(price) 1.871 2.5380 3.92111 3.1110 4.5880 15.318 100
# proc.for(price) 408.200 448.2830 542.19766 484.1265 546.3255 1821.104 100
# proc.aaron(price) 23.916 25.5770 33.53259 31.5420 35.8575 190.372 100
# proc.jbaums(price) 33.536 38.8995 46.80109 43.4510 49.3555 112.306 100
与for循环相比,我们看到的加速超过100倍,与所提供向量的矢量化替代相比,加速了10倍。
使用更大的向量(此处测试的长度为一百万),加速甚至更为重要:
price.big <- rep(price, times=5000)
all.equal(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# [1] TRUE
microbenchmark(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# Unit: milliseconds
# expr min lq mean median uq max neval
# proc(price.big) 1.442119 1.818494 5.094274 2.020437 2.771903 56.54321 100
# proc.for(price.big) 2639.819536 2699.493613 2949.962241 2781.636460 3062.277930 4472.35369 100
# proc.aaron(price.big) 91.499940 99.859418 132.519296 140.521212 147.462259 207.72813 100
# proc.jbaums(price.big) 117.242451 138.528214 170.989065 170.606048 180.337074 487.13615 100
现在,与for循环相比,我们的速度提高了1000倍,与矢量化R函数相比,速度提高了约70倍。即使只有这个大小,如果函数仅被调用一次,Rcpp相对矢量化R解决方案是否仍有很多优势尚不清楚,因为编译Rcpp代码肯定需要至少100 ms。如果这是一段您在分析中反复调用的代码,则提速将非常有吸引力。
关于r - 如何使用Rcpp加速for循环?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29223132/