r - 当每个月都有不同数量的观察值时,如何将值滞后一个月?

标签 r data.table

我有一个多个日期的数据集。我想将 Cells 的值滞后一个月。我可能无法使用 shift() 因为每个月都有不同的天数(更不用说还有一些缺失的日期)。

我所做的是创建一个具有唯一的年份月份的新数据表,移动/滞后单元格,然后合并它与原始数据表(注意不要有重复的列)。

显然,这效率不高。还有其他方法吗?

sapply(c('data.table', 'lubridate'), require, character.only = TRUE)

DT <- fread('DATE, ID, Cells
2000-01-01, 1, 10
2000-01-02, 1, 10
2000-01-03, 1, 10
2000-01-01, 2, 20
2000-01-02, 2, 20
2000-01-03, 2, 20
2000-01-04, 2, 20
2000-02-01, 1, 30
2000-02-02, 1, 30
2000-02-01, 2, 40
2000-02-03, 2, 40
2000-02-04, 2, 40
2000-03-01, 1, 50
2000-03-02, 1, 50
2000-03-01, 2, 60
2000-03-03, 2, 60
')


DT[, date := as.Date(DATE, format = '%Y-%m-%d')][,
           c('Year', 'Month') := .(year(date), month(date))]

setkey(DT, Year, Month, ID)

DT.Months <- DT[which(!duplicated(DT))][, 
               .(Year, Month, ID, Cells)]

DT.Months[, `:=`(Lagged.Cells = 
          shift(Cells, 1L, type = 'lag')), by = .(ID)]

DT <- DT[DT.Months][, `:=`(i.Cells, NULL)]

# > DT # This is what I want. 
# The Value in Cells is lagged by one month, 
# regardless of the number of observations within a month for each ID.
#          DATE ID Cells       date Year Month Lagged.Cells
# 1: 2000-01-01  1    10 2000-01-01 2000     1           NA
# 2: 2000-01-02  1    10 2000-01-02 2000     1           NA
# 3: 2000-01-03  1    10 2000-01-03 2000     1           NA
# 4: 2000-01-01  2    20 2000-01-01 2000     1           NA
# 5: 2000-01-02  2    20 2000-01-02 2000     1           NA
# 6: 2000-01-03  2    20 2000-01-03 2000     1           NA
# 7: 2000-01-04  2    20 2000-01-04 2000     1           NA
# 8: 2000-02-01  1    30 2000-02-01 2000     2           10
# 9: 2000-02-02  1    30 2000-02-02 2000     2           10
#10: 2000-02-01  2    40 2000-02-01 2000     2           10
#11: 2000-02-03  2    40 2000-02-03 2000     2           20
#12: 2000-02-04  2    40 2000-02-04 2000     2           20
#13: 2000-03-01  1    50 2000-03-01 2000     3           20
#14: 2000-03-02  1    50 2000-03-02 2000     3           20
#15: 2000-03-01  2    60 2000-03-01 2000     3           30
#16: 2000-03-03  2    60 2000-03-03 2000     3           30

最佳答案

# Replace your sapply usage with pacman and you'll thank me
#   pacman installs if needed, loads, and doesn't require quotation marks
pacman::p_load(data.table, lubridate) 

DT <- fread('DATE, ID, Cells
            2000-01-01, 1, 10
            2000-01-02, 1, 10
            2000-01-03, 1, 10
            2000-01-01, 2, 20
            2000-01-02, 2, 20
            2000-01-03, 2, 20
            2000-01-04, 2, 20
            2000-02-01, 1, 30
            2000-02-02, 1, 30
            2000-02-01, 2, 40
            2000-02-03, 2, 40
            2000-02-04, 2, 40
            2000-03-01, 1, 50
            2000-03-02, 1, 50
            2000-03-01, 2, 60
            2000-03-03, 2, 60
            ')
DT$date      <- ymd(DT$DATE)
DT$month     <- format((DT$date), "%b")
lag.cells    <- as.vector(capture.output(cat(rep("NA", length(DT$month[DT$month == "Jan"])), DT$Cells)))
lag.cells    <- strsplit(lag.cells, "\\s+")[[1]]
lag.cells    <- lag.cells[1:nrow(DT)]
DT$lag.cells <- lag.cells
DT

          DATE ID Cells       date month lag.cells
 1: 2000-01-01  1    10 2000-01-01   Jan        NA
 2: 2000-01-02  1    10 2000-01-02   Jan        NA
 3: 2000-01-03  1    10 2000-01-03   Jan        NA
 4: 2000-01-01  2    20 2000-01-01   Jan        NA
 5: 2000-01-02  2    20 2000-01-02   Jan        NA
 6: 2000-01-03  2    20 2000-01-03   Jan        NA
 7: 2000-01-04  2    20 2000-01-04   Jan        NA
 8: 2000-02-01  1    30 2000-02-01   Feb        10
 9: 2000-02-02  1    30 2000-02-02   Feb        10
10: 2000-02-01  2    40 2000-02-01   Feb        10
11: 2000-02-03  2    40 2000-02-03   Feb        20
12: 2000-02-04  2    40 2000-02-04   Feb        20
13: 2000-03-01  1    50 2000-03-01   Mar        20
14: 2000-03-02  1    50 2000-03-02   Mar        20
15: 2000-03-01  2    60 2000-03-01   Mar        30
16: 2000-03-03  2    60 2000-03-03   Mar        30

关于r - 当每个月都有不同数量的观察值时,如何将值滞后一个月?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35437545/

相关文章:

r - 在 R 中的列数据框中拆分字符串并为子字符串创建其他列

r - 问题类强制过滤 R data.table ifelse, if_else, if ... else

r - 在 data.table tstrsplit 中动态分配分割数

使用 data.tables 时替换 unique(rbind())

r - group_by dplyr 未分组

使用 for 循环替换非结构化文本文件中的单词

r - 将字符串作为 data.table 中的列名传递

r - 使用另一个 data.table 子集一个 data.table

r - 根据列中的第一个值过滤数据框列表

r - 错误 : The animation object does not specify a save_animation method