r - 如何通过索引在初始化向量中存储不同大小的 for 循环输出

问题陈述

假设您有以下数据:

df <- data.frame(x = rep(0, 10),
                 batch = rep(1:3,c(4,2,4)))

   x batch
1  0     1
2  0     1
3  0     1
4  0     1
5  0     2
6  0     2
7  0     3
8  0     3
9  0     3
10 0     3

您想要循环数据集中唯一批处理的数量，并在每个批处理内应用算法来生成 1 和 0 的向量。该算法相当长，因此为了举例，我们假设它是一个随机样本:

set.seed(2021)

for(i in seq_len(length(unique(df$batch)))){
  batch_val <- d[which(df$batch == i),]$batch
  #some algorithm to generate 1's and 0's, but using sample() here
  out_x <- sample(c(0,1), length(batch_val), replace = T)
}

然后，您希望将 out_x 保存到 df$x 中的正确索引中。我当前的基本方法是显式指定索引:

idxb <- 1
idxe <- length(df[which(df$batch == 1),]$batch)

set.seed(2021)
for(i in seq_len(length(unique(df$batch)))){
  batch_val <- d[which(df$batch == i),]$batch
  #some algorithm to generate 1's and 0's, but using sample() here
  out_x <- sample(c(0,1), length(batch_val), replace = T)
  print(out_x)

  #save output
  df$x[idxb:idxe] <- out_x
  
  #update indices
  idxb <- idxb + length(out_X)
  
  if(i < length(unique(df$batch))) {
    idxe <- idxe + length(df[which(df$batch == i+1),]$batch) 
  }
}

输出

结果应如下所示:

其中 out_x 的每次迭代如下所示:

[1] 0 1 1 0
[1] 1 1
[1] 1 0 1 1

问题

在仍然使用基础 R 的情况下，有什么更快的方法来实现这一点？

最佳答案

使用tapply怎么样？

out_x <- tapply(df$batch, df$batch, function(x) sample(c(0,1), length(x), replace = T))

#------
$`1`
[1] 0 1 1 1

$`2`
[1] 0 1

$`3`
[1] 1 1 1 1

然后重新分配给df

df$x <- unlist(out_x)

时序测试:

microbenchmark::microbenchmark(f_loop(), f_apply())

#---------
Unit: microseconds
      expr     min       lq     mean  median      uq      max neval
  f_loop() 399.895 425.1975 442.7077 437.754 450.690  612.969   100
 f_apply() 100.449 106.9185 160.5557 110.913 114.909 4867.603   100

函数定义为

f_loop <- function(){
  
  idxb <- 1
  idxe <- length(df[which(df$batch == 1),]$batch)

  for(i in seq_len(length(unique(df$batch)))){
    
    batch_val <- df[which(df$batch == i),]$batch
    #some algorithm to generate 1's and 0's, but using sample() here
    out_x <- sample(c(0,1), length(batch_val), replace = T)
    #print(out_x)
    
    #save output
    df$x[idxb:idxe] <- out_x
    
    #update indices
    idxb <- idxb + length(out_x)
    
    if(i < length(unique(df$batch))) {
      idxe <- idxe + length(df[which(df$batch == i+1),]$batch) 
    }
  }
  
  return(df$x)
}


f_apply <- function() {
  unlist(tapply(df$batch, df$batch, function(x) sample(c(0,1), length(x), replace = T)))
}

关于r - 如何通过索引在初始化向量中存储不同大小的 for 循环输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65798084/

r - 如何通过索引在初始化向量中存储不同大小的 for 循环输出

上一篇：r - 在绘图上标记点(R 语言)

下一篇：html - 如何使用 thymeleaf 在 Spring Boot 中加载图像？