r - 从日期列表中获取初始月份

我有一个包含两个变量的数据集:个人的日期和服务年数(仅用于制作一个可重复的小示例)。我需要获取此人开始工作的月份(本例为 1989-06)，考虑到如果解决方案适用于许多人，开始工作的月份可能因人而异。像这样的事情:

library(data.table)
dt <- structure(list(DATE = c("2009-01", "2009-02", "2009-03", "2009-04", 
                          "2009-05", "2009-06", "2009-07", "2009-08", "2009-09", "2009-10", 
                          "2009-11", "2009-12", "2010-01", "2010-02", "2010-03", "2010-04", 
                          "2010-05", "2010-06", "2010-07", "2010-08", "2010-09", "2010-10", 
                          "2010-11", "2010-12", "2011-01", "2011-02", "2011-03", "2011-04", 
                          "2011-05", "2011-06", "2011-07", "2011-08", "2011-09", "2011-10", 
                          "2011-11", "2011-12"), Years_service = c(19, 19, 19, 19, 19, 
                                                                   20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 
                                                                   21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22), 
                 INITIAL_MONTH = c("1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06", 
                                   "1989-06", "1989-06")), .Names = c("DATE", "Years_service", 
                                                                      "INITIAL_MONTH"), class = c("data.table", "data.frame"), row.names = c(NA,-36L))

head(dt)
      DATE Years_service INITIAL_MONTH
1: 2009-01            19       1989-06
2: 2009-02            19       1989-06
3: 2009-03            19       1989-06
4: 2009-04            19       1989-06
5: 2009-05            19       1989-06
6: 2009-06            20       1989-06

如何在 R 中获取它？

最佳答案

我们可以在 Years_service 列中找到第一个更改，并用该索引中存在的相应 DATE 值减去它。

library(dplyr)
library(lubridate)

dt %>%
  mutate(inds = which.max(diff(Years_service) != 0) + 1, 
        init_month = format(as.Date(paste0(DATE[inds], "-01")) - 
                      years(Years_service[inds]), "%Y-%m")) %>%
  select(-inds)

#      DATE Years_service INITIAL_MONTH init_month
#1  2009-01            19       1989-06    1989-06
#2  2009-02            19       1989-06    1989-06
#3  2009-03            19       1989-06    1989-06
#4  2009-04            19       1989-06    1989-06
#....

您可能想为多人执行此操作，您可以向其中添加 group_by 子句

dt %>%
  group_by(person) %>%
  mutate(inds = which.max(diff(Years_service) != 0) + 1, 
         init_month = format(as.Date(paste0(DATE[inds], "-01")) - 
                       years(Years_service[inds]), "%Y-%m")) %>%
  select(-inds)

编辑

对于更新后的案例，我们可能需要先安排日期

dt1 <- dt[order(-DATE)]

dt1 %>%
  mutate(dates = as.Date(paste0(DATE, "-01"))) %>%
  arrange(dates) %>%
  mutate(inds = which.max(diff(Years_service) != 0) + 1, 
     init_month = format(dates[inds] - years(Years_service[inds]), "%Y-%m")) %>%
  select(-inds)

关于r - 从日期列表中获取初始月份，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56250532/

r - 从日期列表中获取初始月份

上一篇：google-bigquery - 安排查询以在 BigQuery 的项目之间复制数据集中的数据

下一篇：spring-boot - Ignite CrudRepository 仍然遇到 deleteAll 的名称冲突