r - 将函数应用于列的可变子集

考虑这个玩具数据框:

df <- data.frame(id = c(1, 2),
             meandoy = c(3,2),
             temp199701 = c(4,2),
             temp199702 = c(15,10),
             temp199703 = c(-3,7),
             temp199704 = c(-1,6),
             temp199801 = c(1,5),
             temp199802 = c(9,10),
             temp199803 = c(-2,2),
             temp199804 = c(-5,11))

我想添加一个新列，其中包含每年每行的函数结果。换句话说，每个新的GDDyear列都会获取从tempyear01到tempyear04计算的值。

我可以用这个实现它:

sum.GDD <- function(x) sum(x[x > 5]-5, na.rm = TRUE)
    
yearlist <- c(1997, 1998)
        
for (year in yearlist){
      text <- paste("GDD",toString(year), sep = "")
      df[[text]] <- df %>%  #store result in this vector
        dplyr::select(contains(toString(year))) %>% #take variables that have year
        apply(1, sum.GDD) #calculate GDD5 across those variables
    }

但是有一个转折。我只想将该函数应用于每年 meandoy 中指定的列数。

例如，第一行中的GDD1997将是从temp199701开始的前3列计算的结果，因为meandoy = 3 。 GDD1998将从temp199801、temp199802和temp199803获取结果。

在第二行中，meandoy = 2，因此 GDD1997 的结果将根据 temp199701 和 temp199702 计算得出>。来自 temp199801 和 temp199802 的 GDD1998。

最佳答案

如果有疑问，通常可以通过将数据转换为长格式来简化问题。

由于您已经在使用 dplyr，我们可以:

totals <- df %>%
  # Turn the dataframe into format id, meandoy, year, doy, value by parsing
  # the columns while unpivoting.
  pivot_longer(
    c(everything(), -id, -meandoy), 
    names_to = c("year", "doy"), names_pattern = "temp(\\d{4})(\\d{2})", 
    names_transform = list(year = as.integer, doy = as.integer)
  ) %>%
  # Selects all columns (in the original df) from year01 to year<meandoy>.
  filter(doy <= meandoy) %>%
  # Calculate the GDD  
  group_by(id, year) %>%
  summarize(total = sum.GDD(value), .groups = "drop") %>%
  # Back to the original format.
  pivot_wider(names_from = year, values_from = total, names_prefix = "GDD") 
  # Selects all columns (in the original df) from year01 to year<meandoy>.
  filter(doy <= meandoy) %>%
  # Calculate the GDD  
  group_by(id, year) %>%
  summarize(total = sum.GDD(value), .groups = "drop") %>%
  # Back to the original format.
  pivot_wider(names_from = year, values_from = total, names_prefix = "GDD") 

left_join(df, totals, by = "id")

这应该比执行行操作和/或循环的方法更快。

关于r - 将函数应用于列的可变子集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69551897/

r - 将函数应用于列的可变子集

上一篇：c# - 我的 .net core 发布的可执行文件提供了与运行代码时不同的目录路径

下一篇：php - 如何在 "No products were found matching your selection"WooCommerce 消息中添加 HTML