r - 基于 R 中的序列高效计算总和

标签 r performance statistics sum log-likelihood

我正在尝试尽快计算 R 中的特定总和。 object of interest

enter image description here

相关输入对象是两个LK矩阵x(仅包含正整数)和alpha(仅包含正实数值)。 A 相当于 rowSums(alpha)N 相当于 rowSums(x)。下标lk分别表示alphax的行/列。

起初我认为想出一些超快的东西会很容易,但我找不到一个优雅的解决方案。我认为 seq() 的矩阵值版本在这里会非常有帮助。有没有人有一个创造性的解决方案来有效地实现这一点?

这是一个易于阅读但显然效率低下的基于循环的版本供引用:

# parameters
L = 20
K = 5

# x ... L x K matrix of integers
x = matrix(1 : (L * K), L, K)

# alpha ... L x K matrix of positive real numbers
alpha = matrix(1 : (L * K) / 100, L, K)

# N ... sum over rows of x
N = rowSums(x)

# A ... sum over rows of alpha
A = rowSums(alpha)


# implementation 

stacksum = function(x, alpha, N, A){
  
  # parameters
  K = ncol(x)
  L = nrow(x)
  
  result = 0
  
  for(ll in 1:L){
  
  # first part of sum
  first.sum = 0
  
  for(kk in 1:K){
    
    # create sequence
    sequence.k = seq(alpha[ll, kk], (alpha[ll, kk] + x[ll, kk] - 1), 1)
    
    # take logs and sum
    first.sum = first.sum + sum(log(sequence.k))
    
  }
  
  # second part of sum
  second.sum = sum(log(seq(A[ll], (A[ll] + N[ll] - 1), 1)))
  
  # add to result
  result = result + first.sum - second.sum
  
  }
  
  return(result)
  
  
}

# test
stacksum(x, alpha, N, A)

最佳答案

使用基于 @RobertDodier 评论的 lgamma 解决方案更新

使用sequencerep.int

# parameters
L <- 20
K <- 5

# x ... L x K matrix of integers
x <- matrix(1 : (L * K), L, K)
# alpha ... L x K matrix of positive real numbers
alpha <- matrix(1 : (L * K) / 100, L, K)
# N ... sum over rows of x
N <- rowSums(x)
# A ... sum over rows of alpha
A <- rowSums(alpha)

# proposed solution
stacksum2 <- function(x, alpha, N, A) {
  sum(log(sequence(x, alpha) + rep.int(alpha %% 1, x))) - sum(log(sequence(N, A) + rep.int(A %% 1, N)))
}

# solution from Robert Dodier's comments
stacksum3 <- function(x, alpha, N, A) {
  sum(lgamma(alpha + x) - lgamma(alpha)) - sum(lgamma(A + N) - lgamma(A))
}

# OP solution
stacksum1 = function(x, alpha, N, A){
  # parameters
  K = ncol(x)
  L = nrow(x)
  result = 0
  
  for(ll in 1:L){
    # first part of sum
    first.sum = 0
    for(kk in 1:K){
      # create sequence
      sequence.k = seq(alpha[ll, kk], (alpha[ll, kk] + x[ll, kk] - 1), 1)
      # take logs and sum
      first.sum = first.sum + sum(log(sequence.k))
    }
    # second part of sum
    second.sum = sum(log(seq(A[ll], (A[ll] + N[ll] - 1), 1)))
    # add to result
    result = result + first.sum - second.sum
  }
  result
}
res <- list(
  stacksum1(x, alpha, N, A),
  stacksum2(x, alpha, N, A),
  stacksum3(x, alpha, N, A)
)

all.equal(res[1:2], res[-1])
#> [1] TRUE

microbenchmark::microbenchmark(stacksum1 = stacksum1(x, alpha, N, A),
                               stacksum2 = stacksum2(x, alpha, N, A),
                               stacksum3 = stacksum3(x, alpha, N, A),
                               check = "equal")
#> Unit: microseconds
#>       expr    min      lq     mean  median      uq    max neval
#>  stacksum1 1654.2 1704.60 1899.384 1740.80 1964.75 4234.4   100
#>  stacksum2  238.2  246.45  258.284  252.35  268.40  319.4   100
#>  stacksum3   18.5   19.05   20.981   20.55   21.70   36.4   100

关于r - 基于 R 中的序列高效计算总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73226191/

相关文章:

r - 需要通过 r 中的指定间隔识别连续观察的新方法

r - 使用 shiny 模块和 shinydashboard : shiny. 标签错误

Java - ThreadLocal 还是并发对象池?

ios - 何时以及为什么应该使用 NSUserDefaults 的 synchronize() 方法?

r - ggpairs 更改轴标签字体大小

r - R填充向量

r - R中的4向维恩图?

c# - 图像绘制速度

r - 使用 R 生成泊松过程

r - 基于Newton-Raphson和矩量法的最大似然估计