c++ - 高效的感知重要点 (PIP) 算法。在 R 或 Rcpp 中

我尝试编写一种算法来查找时间序列中的感知重要点 (PIP) PIP。这些是“塑造”或“表征”时间序列的点。这很简单。该算法连接时间序列的第一个点和最后一个点，并在它们之间“绘制”一条线。在下一步中，算法会在时间序列中寻找与“假想”线具有最大“距离”(该距离可以通过垂直距离或欧氏距离简单测量)的点。这一点是下一个 PIP。现在有两条线。将第一个点与新 PIP 连接，第二条线 - 将新 PIP 与最后一个点连接。该算法现在再次执行相同的操作。检查两条线:“哪个点距离最远”-> 将此点设置为下一个 PIP。

这是算法的可视化，后面是伪代码

Identification of the first 5 PIPs using the vertical distancnce (Fu2008)

伪代码:

Function findPIPs(P)
Input: sequence p[1 ..m]
Output: PIPList L[l..m]
Begin
Set L[l] = P[1], L[2] = P[2]
Repeat until L[l ..m] all filled 
Begin
Select point p[j] with maximum distance to the adjacent points in PIPList(L[1] and L[2] initially)
Append P[j] to L
End
Return L
END

我试图在 R 中实现它。但它似乎完全没有效率。这需要很多时间。此外还有一个小“错误”(有时一些点恰好位于插值线上。因此，距离为“0”——算法目前没有考虑这个问题)。

最后我想我需要用 Rcpp 用 c++ 编写代码。例如，有没有办法通过向量化使 R 代码更高效？

这是我的 R 代码:

# PIPs

# -> Input
## Vector
# <- Output
# PIPs Indiezes

getPIPs <- function(x, y, distance = "EUK") {
    PIPs <- vector("list", 4)
    PIPs[[1]] <- c(1, length(y)) 
  
    for(i in 1:(length(y)-2)) {
        switch(distance,
            EUK = (DISTANCE.F <- EUK.distance.f),
            VD  = (DISTANCE.F <- VD.distance.f ),
        )
    
        PIPs <- helper.f(PIPs, x, y, DISTANCE.F)
    }
  
    return(PIPs)
}

helper.f <- function(PIPs, x, y, DISTANCE.F) {
    t <- sort(PIPs[[1]])
    gesamt <- NULL
  
    for(z in 1:(length(t)-1)) {  
        gesamt <- c(gesamt,DISTANCE.F(x, y, t[z], t[z+1]))
    }

    if(all(gesamt == 0)) return(PIPs)
    else PIPs[[1]] <- append(PIPs[[1]], which.max(gesamt))
  
    return(PIPs)
}

EUK.distance.f <- function(x, y, sI, eI) {
    pointsbetween <- sI:eI

    erg <- 
    sqrt((sI - x[pointsbetween])^2 + (y[sI] - y[pointsbetween])^2) + 
    sqrt((eI - x[pointsbetween])^2 + (y[eI] - y[pointsbetween])^2)
    erg[1] <- 0
    erg <- erg[-length(erg)]

    return(erg)
}

VD.distance.f <- function(x, y, sI, eI) { #Start und Endindex
    erg <- 
    abs(y[sI:eI] - (y[sI] +
        (x[sI:eI] - x[sI]) *
        ((y[eI] - y[sI]) / (x[eI] - x[sI]))
        )
    )
    erg <- erg[-length(erg)]

    return(erg)
}

#visualize
itertivePlotPIPS.f <- function(x, y, z) {
    plot(x, y)
    lines(sort(PIPs[[1]][1:length(x)]), y[sort(PIPs[[1]][1:length(x)])], col  = "azure3")
    lines(sort(PIPs[[1]][1:z]), y[sort(PIPs[[1]][1:z])])
}

运行代码

x <- 1:100 # "Time" (x)-axis
y <- sample(1:100) # "Data" y-axis
getPIPs(x, y, "EUK")

可视化数据

itertivePlotPIPS.f(x,y,10) # the 10 at the end means "take the first ten PIPs"

我希望它不会太困惑。我试图让它变得简单。

References: Fu, Tak chung et al. (2008). "Representing Financial time series based on data point importance". In: Engineering Applications of Artifcial Intelligence 21.2. F/S, PIIP, pp. 277{300. issn: 0952-1976. doi: http://dx.doi.org/10. 1016/j.engappai.2007.04.009. url: http://www.sciencedirect. com/science/article/pii/S0952197607000577.

最佳答案

这可能会更快，尚未实际测试您的代码。我认为这可行(？!)，但可能需要进行测试。它只检查到线的垂直距离，需要做更多的工作来检查到线的欧氏距离。大多数情况下，它只是避免使用可能有助于加快代码速度的显式 for 循环。

## Some test data
tst <- data.frame(x=1:100, y=rnorm(100, 4*sin(seq(1,4*pi,len=100)), 1))
tst <- as.matrix(tst)

pip <- function(ps, interp=NULL, breakpoints=NULL) {
    if (missing(interp)) {
        interp <- approx(x=c(ps[1,"x"], ps[nrow(ps),"x"]), 
                         y=c(ps[1,"y"],ps[nrow(ps),"y"]), n=nrow(ps))
        interp <- do.call(cbind, interp)
        breakpoints <- c(1, nrow(ps))
    } else {
        ds <- sqrt(rowSums((ps - interp)^2))  # close by euclidean distance
        ind <- which.max(ds)
        ends <- c(min(ind-breakpoints[breakpoints<ind]), min(breakpoints[breakpoints>ind]-ind))
        leg1 <- approx(x=c(ps[ind-ends[1],"x"], ps[ind,"x"]),
                         y=c(ps[ind-ends[1],"y"], ps[ind,"y"]), n=ends[1]+1)
        leg2 <- approx(x=c(ps[ind,"x"], ps[ind+ends[2],"x"]),
                         y=c(ps[ind,"y"], ps[ind+ends[2],"y"]), n=ends[2])
        interp[(ind-ends[1]):ind, "y"] <- leg1$y
        interp[(ind+1):(ind+ends[2]), "y"] <- leg2$y
        breakpoints <- c(breakpoints, ind)
    }
    list(interp=interp, breakpoints=breakpoints)
}

constructPIP <- function(ps, times=10) {
    res <- pip(ps)
    for (i in 2:times) {
        res <- pip(ps, res$interp, res$breakpoints)
    }
    res
}

res <- constructPIP(tst, times=5)
plot(tst)
points(res$interp, col="blue", type="l")

enter image description here

关于c++ - 高效的感知重要点 (PIP) 算法。在 R 或 Rcpp 中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30428900/

c++ - 高效的感知重要点 (PIP) 算法。在 R 或 Rcpp 中

上一篇：c++ - 发出警告并编译错误警告是否违反合规性？

下一篇：c++ - 为什么析构函数在显式调用时不会在模板化指针上被调用？