我想根据定义的休息
(0-7天,8-15天,...,31-50天)来削减日期范围,然后计算组平均值值(value)。
library(dplyr)
date = seq(as.Date("2019/1/1"), by = "day", length.out = 50)
value = matrix(rnorm(200, 100, 50), nrow=50) %>% data.frame()
sample = cbind(date, value) %>% data.frame()
breaks = c(0, 7, 15, 30, 50)
sample %>%
group_by(cutt = cut(StayDate, breaks=breaks)) %>%
summarise(m1 = mean(X1), m2=mean(X2))
不过,cut
函数似乎只能使用“日”、“周”等来进行剪切。我有什么办法可以做到吗?
最佳答案
我们可以转换为“factor”
,然后再转换回“numeric”
。
library(dplyr)
sample %>%
group_by(cutt=cut(as.numeric(factor(date)), breaks=breaks)) %>%
summarise(m1=mean(X1), m2=mean(X2))
# # A tibble: 4 x 3
# cutt m1 m2
# <fct> <dbl> <dbl>
# 1 (0,7] 126. 120.
# 2 (7,15] 123. 90.3
# 3 (15,30] 82.6 107.
# 4 (30,50] 90.4 104.
或者以 R 为基数:
do.call(rbind, by(sample[2:3], cut(as.numeric(factor(sample$date)), breaks), colMeans))
# X1 X2
# (0,7] 125.79941 120.01652
# (7,15] 122.82247 90.33681
# (15,30] 82.64698 107.13250
# (30,50] 90.39701 104.09779
数据
set.seed(42)
n <- 50
sample <- data.frame(date=seq(as.Date("2019/1/1"), by="day", length.out=n),
matrix(rnorm(4*n, 100, 50), ncol=4,
dimnames=list(NULL, paste0("X", 1:4))))
breaks <- c(0, 7, 15, 30, 50)
关于r - 使用自定义中断来剪切日期向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56843548/