我有如下数据。
df <- data.frame(CustID = c(1,2,3,4,5,1,5),
CustName = c("Fred","Maria","John","Mark", "Julia","Fred","Julia"),
ServiceDate = c('2010-11-1','2008-3-25','2007-3-14','2010-11-1','2008-3-25','2010-12-14','2008-3-14'), stringsAsFactors = F)
df$ServiceDate <- as.Date(df$ServiceDate, "%Y-%m-%d")
df
CustID CustName ServiceDate
1 1 Fred 2010-11-01
2 2 Maria 2008-03-25
3 3 John 2007-03-14
4 4 Mark 2010-11-01
5 5 Julia 2008-03-25
6 1 Fred 2010-12-14
7 5 Julia 2008-03-14
我需要找出一种方法来根据 CusID 和 ServiceDate 获取先验值,以便我得到如下内容:
CustID CustName ServiceDate PriorServiceDate
1 1 Fred 2010-11-01 <NA>
2 2 Maria 2008-03-25 <NA>
3 3 John 2007-03-14 <NA>
4 4 Mark 2010-11-01 <NA>
5 5 Julia 2008-03-25 2008-03-14
6 1 Fred 2010-12-14 2010-11-01
7 5 Julia 2008-03-14 <NA>
我试过使用 sqldf 但没有成功。谢谢。
最佳答案
使用 dplyr
我认为这应该可以解决您的问题。
library(dplyr)
df %>%
group_by(CustID) %>%
arrange(ServiceDate) %>%
mutate(PriorServiceDate = lag(ServiceDate))
Source: local data frame [7 x 4]
Groups: CustID
CustID CustName ServiceDate PriorServiceDate
1 1 Fred 2010-11-01 <NA>
2 1 Fred 2010-12-14 2010-11-01
3 2 Maria 2008-03-25 <NA>
4 3 John 2007-03-14 <NA>
5 4 Mark 2010-11-01 <NA>
6 5 Julia 2008-03-14 <NA>
7 5 Julia 2008-03-25 2008-03-14
请注意,这假设您正在查看具有lag
的前一个日期,而不是最短日期(不确定您的问题)。
如果你确实想要 min
那么你可以很容易地索引其他的
df2 <- df %>%
group_by(CustID) %>%
arrange(ServiceDate) %>%
mutate(PriorServiceDate = min(ServiceDate))
df2$PriorServiceDate[which(df2$ServiceDate == df2$PriorServiceDate)] = NA
Source: local data frame [7 x 4]
Groups: CustID
CustID CustName ServiceDate PriorServiceDate
1 1 Fred 2010-11-01 <NA>
2 1 Fred 2010-12-14 2010-11-01
3 2 Maria 2008-03-25 <NA>
4 3 John 2007-03-14 <NA>
5 4 Mark 2010-11-01 <NA>
6 5 Julia 2008-03-14 <NA>
7 5 Julia 2008-03-25 2008-03-14
关于r - 根据 R 中的标准获取之前的日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29151304/