我有一个包含两个变量的数据集:个人的日期和服务年数(仅用于制作一个可重复的小示例)。 我需要获取此人开始工作的月份(本例为 1989-06),考虑到如果解决方案适用于许多人,开始工作的月份可能因人而异。 像这样的事情:
library(data.table)
dt <- structure(list(DATE = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-06", "2009-07", "2009-08", "2009-09", "2009-10",
"2009-11", "2009-12", "2010-01", "2010-02", "2010-03", "2010-04",
"2010-05", "2010-06", "2010-07", "2010-08", "2010-09", "2010-10",
"2010-11", "2010-12", "2011-01", "2011-02", "2011-03", "2011-04",
"2011-05", "2011-06", "2011-07", "2011-08", "2011-09", "2011-10",
"2011-11", "2011-12"), Years_service = c(19, 19, 19, 19, 19,
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22),
INITIAL_MONTH = c("1989-06", "1989-06", "1989-06", "1989-06",
"1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06",
"1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06",
"1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06",
"1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06",
"1989-06", "1989-06", "1989-06", "1989-06", "1989-06", "1989-06",
"1989-06", "1989-06")), .Names = c("DATE", "Years_service",
"INITIAL_MONTH"), class = c("data.table", "data.frame"), row.names = c(NA,-36L))
head(dt)
DATE Years_service INITIAL_MONTH
1: 2009-01 19 1989-06
2: 2009-02 19 1989-06
3: 2009-03 19 1989-06
4: 2009-04 19 1989-06
5: 2009-05 19 1989-06
6: 2009-06 20 1989-06
如何在 R 中获取它?
最佳答案
我们可以在 Years_service
列中找到第一个更改,并用该索引中存在的相应 DATE
值减去它。
library(dplyr)
library(lubridate)
dt %>%
mutate(inds = which.max(diff(Years_service) != 0) + 1,
init_month = format(as.Date(paste0(DATE[inds], "-01")) -
years(Years_service[inds]), "%Y-%m")) %>%
select(-inds)
# DATE Years_service INITIAL_MONTH init_month
#1 2009-01 19 1989-06 1989-06
#2 2009-02 19 1989-06 1989-06
#3 2009-03 19 1989-06 1989-06
#4 2009-04 19 1989-06 1989-06
#....
您可能想为多人执行此操作,您可以向其中添加 group_by
子句
dt %>%
group_by(person) %>%
mutate(inds = which.max(diff(Years_service) != 0) + 1,
init_month = format(as.Date(paste0(DATE[inds], "-01")) -
years(Years_service[inds]), "%Y-%m")) %>%
select(-inds)
编辑
对于更新后的案例,我们可能需要先安排
日期
dt1 <- dt[order(-DATE)]
dt1 %>%
mutate(dates = as.Date(paste0(DATE, "-01"))) %>%
arrange(dates) %>%
mutate(inds = which.max(diff(Years_service) != 0) + 1,
init_month = format(dates[inds] - years(Years_service[inds]), "%Y-%m")) %>%
select(-inds)
关于r - 从日期列表中获取初始月份,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56250532/