r - 从日期列表中获取初始月份

标签 r data.table

我有一个包含两个变量的数据集:个人的日期服务年数(仅用于制作一个可重复的小示例)。 我需要获取此人开始工作的月份(本例为 1989-06),考虑到如果解决方案适用于许多人,开始工作的月份可能因人而异。 像这样的事情:

library(data.table)
dt <- structure(list(DATE = c("2009-01", "2009-02", "2009-03", "2009-04", 
                          "2009-05", "2009-06", "2009-07", "2009-08", "2009-09", "2009-10", 
                          "2009-11", "2009-12", "2010-01", "2010-02", "2010-03", "2010-04", 
                          "2010-05", "2010-06", "2010-07", "2010-08", "2010-09", "2010-10", 
                          "2010-11", "2010-12", "2011-01", "2011-02", "2011-03", "2011-04", 
                          "2011-05", "2011-06", "2011-07", "2011-08", "2011-09", "2011-10", 
                          "2011-11", "2011-12"), Years_service = c(19, 19, 19, 19, 19, 
                                                                   20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 
                                                                   21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22), 
                 INITIAL_MONTH = c("1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06")), .Names = c("DATE", "Years_service", 
                                                                      "INITIAL_MONTH"), class = c("data.table", "data.frame"), row.names = c(NA,-36L))

head(dt)
      DATE Years_service INITIAL_MONTH
1: 2009-01            19       1989-06
2: 2009-02            19       1989-06
3: 2009-03            19       1989-06
4: 2009-04            19       1989-06
5: 2009-05            19       1989-06
6: 2009-06            20       1989-06

如何在 R 中获取它?

最佳答案

我们可以在 Years_service 列中找到第一个更改,并用该索引中存在的相应 DATE 值减去它。

library(dplyr)
library(lubridate)

dt %>%
  mutate(inds = which.max(diff(Years_service) != 0) + 1, 
        init_month = format(as.Date(paste0(DATE[inds], "-01")) - 
                      years(Years_service[inds]), "%Y-%m")) %>%
  select(-inds)

#      DATE Years_service INITIAL_MONTH init_month
#1  2009-01            19       1989-06    1989-06
#2  2009-02            19       1989-06    1989-06
#3  2009-03            19       1989-06    1989-06
#4  2009-04            19       1989-06    1989-06
#....

您可能想为多人执行此操作,您可以向其中添加 group_by 子句

dt %>%
  group_by(person) %>%
  mutate(inds = which.max(diff(Years_service) != 0) + 1, 
         init_month = format(as.Date(paste0(DATE[inds], "-01")) - 
                       years(Years_service[inds]), "%Y-%m")) %>%
  select(-inds)

编辑

对于更新后的案例,我们可能需要先安排日期

dt1 <- dt[order(-DATE)]

dt1 %>%
  mutate(dates = as.Date(paste0(DATE, "-01"))) %>%
  arrange(dates) %>%
  mutate(inds = which.max(diff(Years_service) != 0) + 1, 
     init_month = format(dates[inds] - years(Years_service[inds]), "%Y-%m")) %>%
  select(-inds)

关于r - 从日期列表中获取初始月份,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56250532/

相关文章:

r - data.table 中的内存泄漏按引用分组分配

r - 计算R中最近x天内ID的出现

r - 为什么重要性参数会影响 R 中随机森林的性能?

R:来自 ggmap 的 get_map()/get_googlemap() 错误

r - 如何计算 R 中两个向量之间不同的众所周知的相似性或距离度量?

r - 如何计算从多个连续列中选择最大范围值的变量

r - Shiny :将侧边栏控件重置为默认值

R错误类型 "Subscript out of bounds"

r - 按组用序列填充 NA 值

r - 运算符 == 在 data.table 中的逻辑列中不一致