考虑这个玩具数据框:
df <- data.frame(id = c(1, 2),
meandoy = c(3,2),
temp199701 = c(4,2),
temp199702 = c(15,10),
temp199703 = c(-3,7),
temp199704 = c(-1,6),
temp199801 = c(1,5),
temp199802 = c(9,10),
temp199803 = c(-2,2),
temp199804 = c(-5,11))
我想添加一个新列,其中包含每年每行的函数结果。换句话说,每个新的GDDyear
列都会获取从tempyear01
到tempyear04
计算的值。
我可以用这个实现它:
sum.GDD <- function(x) sum(x[x > 5]-5, na.rm = TRUE)
yearlist <- c(1997, 1998)
for (year in yearlist){
text <- paste("GDD",toString(year), sep = "")
df[[text]] <- df %>% #store result in this vector
dplyr::select(contains(toString(year))) %>% #take variables that have year
apply(1, sum.GDD) #calculate GDD5 across those variables
}
但是有一个转折。我只想将该函数应用于每年 meandoy
中指定的列数。
例如,第一行中的GDD1997
将是从temp199701
开始的前3列计算的结果,因为meandoy = 3
。 GDD1998
将从temp199801
、temp199802
和temp199803
获取结果。
在第二行中,meandoy = 2
,因此 GDD1997
的结果将根据 temp199701
和 temp199702
计算得出>。来自 temp199801
和 temp199802
的 GDD1998
。
最佳答案
如果有疑问,通常可以通过将数据转换为长格式来简化问题。
由于您已经在使用 dplyr
,我们可以:
totals <- df %>%
# Turn the dataframe into format id, meandoy, year, doy, value by parsing
# the columns while unpivoting.
pivot_longer(
c(everything(), -id, -meandoy),
names_to = c("year", "doy"), names_pattern = "temp(\\d{4})(\\d{2})",
names_transform = list(year = as.integer, doy = as.integer)
) %>%
# Selects all columns (in the original df) from year01 to year<meandoy>.
filter(doy <= meandoy) %>%
# Calculate the GDD
group_by(id, year) %>%
summarize(total = sum.GDD(value), .groups = "drop") %>%
# Back to the original format.
pivot_wider(names_from = year, values_from = total, names_prefix = "GDD")
# Selects all columns (in the original df) from year01 to year<meandoy>.
filter(doy <= meandoy) %>%
# Calculate the GDD
group_by(id, year) %>%
summarize(total = sum.GDD(value), .groups = "drop") %>%
# Back to the original format.
pivot_wider(names_from = year, values_from = total, names_prefix = "GDD")
left_join(df, totals, by = "id")
这应该比执行行操作和/或循环的方法更快。
关于r - 将函数应用于列的可变子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69551897/