我有一个数据框如下:
DATE <- as.Date(c('2016-12-01', '2016-12-02', '2016-12-03', '2016-12-04', '2016-12-01', '2016-12-03', '2016-12-04', '2016-12-04' ))
Parent <- c('A','A','A','A','A','A','A','B')
Child <- c('ab', 'ab', 'ab', 'ab', 'ac','ac', 'ac','bd')
salary <- c(1000, 100, 4000, 2000,1000,3455,1234,600)
avg_child_salary <- c(500, 500, 500, 500, 300, 300, 300, 9000)
Callout <- c('HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)
employ.data
DATE Parent Child avg_child_salary salary Callout
1 2016-12-01 A ab 500 1000 HIGH
2 2016-12-02 A ab 500 100 LOW
3 2016-12-03 A ab 500 4000 HIGH
4 2016-12-04 A ab 500 2000 HIGH
5 2016-12-01 A ac 300 1000 HIGH
6 2016-12-03 A ac 300 3455 HIGH
7 2016-12-04 A ac 300 1234 HIGH
8 2016-12-04 B bd 9000 600 LOW
我已经过滤掉了昨天的数据 2016-12-04
如下:
yesterday <- as.Date(Sys.Date()-1)
df2<-filter(employ.data, DATE == yesterday)
df2
DATE Parent Child avg_child_salary salary Callout
4 2016-12-04 A ab 500 2000 HIGH
7 2016-12-04 A ac 300 1234 HIGH
8 2016-12-04 B bd 9000 600 LOW
我的目标是在 Callout
旁边添加一列,显示自 2016-12-04
以来,callout 处于HIGH
的连续天数> 或 LOW
由 Child
基于 employ.data
数据帧。这就是我需要的最终输出:
DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
4 2016-12-04 A ab 500 2000 HIGH 2
7 2016-12-04 A ac 300 1234 HIGH 2
8 2016-12-04 B bd 9000 600 LOW 1
谢谢!
最佳答案
试试这个,伙计
library(lubridate)
df3 <- df2 %>%
group_by(child, callout) %>%
mutate(DATE = ymd(DATE),
consecutive_day_flag = if_else(DATE == (lag(DATE) + days(1)), 1, 0),
how_many = sum(consecutive_day_flag))
关于在 R 中按组记录连续天数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40983647/