我有一个多个日期的数据集。我想将 Cells
的值滞后一个月。我可能无法使用 shift()
因为每个月都有不同的天数(更不用说还有一些缺失的日期)。
我所做的是创建一个具有唯一的年份
和月份
的新数据表,移动/滞后单元格
,然后合并它与原始数据表(注意不要有重复的列)。
显然,这效率不高。还有其他方法吗?
sapply(c('data.table', 'lubridate'), require, character.only = TRUE)
DT <- fread('DATE, ID, Cells
2000-01-01, 1, 10
2000-01-02, 1, 10
2000-01-03, 1, 10
2000-01-01, 2, 20
2000-01-02, 2, 20
2000-01-03, 2, 20
2000-01-04, 2, 20
2000-02-01, 1, 30
2000-02-02, 1, 30
2000-02-01, 2, 40
2000-02-03, 2, 40
2000-02-04, 2, 40
2000-03-01, 1, 50
2000-03-02, 1, 50
2000-03-01, 2, 60
2000-03-03, 2, 60
')
DT[, date := as.Date(DATE, format = '%Y-%m-%d')][,
c('Year', 'Month') := .(year(date), month(date))]
setkey(DT, Year, Month, ID)
DT.Months <- DT[which(!duplicated(DT))][,
.(Year, Month, ID, Cells)]
DT.Months[, `:=`(Lagged.Cells =
shift(Cells, 1L, type = 'lag')), by = .(ID)]
DT <- DT[DT.Months][, `:=`(i.Cells, NULL)]
# > DT # This is what I want.
# The Value in Cells is lagged by one month,
# regardless of the number of observations within a month for each ID.
# DATE ID Cells date Year Month Lagged.Cells
# 1: 2000-01-01 1 10 2000-01-01 2000 1 NA
# 2: 2000-01-02 1 10 2000-01-02 2000 1 NA
# 3: 2000-01-03 1 10 2000-01-03 2000 1 NA
# 4: 2000-01-01 2 20 2000-01-01 2000 1 NA
# 5: 2000-01-02 2 20 2000-01-02 2000 1 NA
# 6: 2000-01-03 2 20 2000-01-03 2000 1 NA
# 7: 2000-01-04 2 20 2000-01-04 2000 1 NA
# 8: 2000-02-01 1 30 2000-02-01 2000 2 10
# 9: 2000-02-02 1 30 2000-02-02 2000 2 10
#10: 2000-02-01 2 40 2000-02-01 2000 2 10
#11: 2000-02-03 2 40 2000-02-03 2000 2 20
#12: 2000-02-04 2 40 2000-02-04 2000 2 20
#13: 2000-03-01 1 50 2000-03-01 2000 3 20
#14: 2000-03-02 1 50 2000-03-02 2000 3 20
#15: 2000-03-01 2 60 2000-03-01 2000 3 30
#16: 2000-03-03 2 60 2000-03-03 2000 3 30
最佳答案
# Replace your sapply usage with pacman and you'll thank me
# pacman installs if needed, loads, and doesn't require quotation marks
pacman::p_load(data.table, lubridate)
DT <- fread('DATE, ID, Cells
2000-01-01, 1, 10
2000-01-02, 1, 10
2000-01-03, 1, 10
2000-01-01, 2, 20
2000-01-02, 2, 20
2000-01-03, 2, 20
2000-01-04, 2, 20
2000-02-01, 1, 30
2000-02-02, 1, 30
2000-02-01, 2, 40
2000-02-03, 2, 40
2000-02-04, 2, 40
2000-03-01, 1, 50
2000-03-02, 1, 50
2000-03-01, 2, 60
2000-03-03, 2, 60
')
DT$date <- ymd(DT$DATE)
DT$month <- format((DT$date), "%b")
lag.cells <- as.vector(capture.output(cat(rep("NA", length(DT$month[DT$month == "Jan"])), DT$Cells)))
lag.cells <- strsplit(lag.cells, "\\s+")[[1]]
lag.cells <- lag.cells[1:nrow(DT)]
DT$lag.cells <- lag.cells
DT
DATE ID Cells date month lag.cells
1: 2000-01-01 1 10 2000-01-01 Jan NA
2: 2000-01-02 1 10 2000-01-02 Jan NA
3: 2000-01-03 1 10 2000-01-03 Jan NA
4: 2000-01-01 2 20 2000-01-01 Jan NA
5: 2000-01-02 2 20 2000-01-02 Jan NA
6: 2000-01-03 2 20 2000-01-03 Jan NA
7: 2000-01-04 2 20 2000-01-04 Jan NA
8: 2000-02-01 1 30 2000-02-01 Feb 10
9: 2000-02-02 1 30 2000-02-02 Feb 10
10: 2000-02-01 2 40 2000-02-01 Feb 10
11: 2000-02-03 2 40 2000-02-03 Feb 20
12: 2000-02-04 2 40 2000-02-04 Feb 20
13: 2000-03-01 1 50 2000-03-01 Mar 20
14: 2000-03-02 1 50 2000-03-02 Mar 20
15: 2000-03-01 2 60 2000-03-01 Mar 30
16: 2000-03-03 2 60 2000-03-03 Mar 30
关于r - 当每个月都有不同数量的观察值时,如何将值滞后一个月?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35437545/