我想在每个家庭中定义一个指标,用于确定司机是否可以为乘客提供服务。如果他/她的行程在乘客行程后最多 1 小时开始,则司机有空。
例子:
household person mode start
1 1 car 7:20
1 1 car 8:00
1 1 car 8:30
1 2 non-car 7:30
1 3 non-car 7:15
1 4 car 7:00
2 1 car 7:00
2 2 non-car 9:00
在第一个家庭中,司机可以搭车,因为他的行程比第二个人晚 30 分钟,他也可以让第三个人搭车。在二号户口。
输出
household person mode start indicator
1 1 car 8:00 1
1 2 non-car 7:30 1
1 3 non-car 7:15 1
2 1 car 7:00 0
2 2 non-car 9:00 0
然后我想把这些匹配的行(指标为1)并排放置
输出
household person mode start indicator household person mode start indicator
1 1 car 8:00 1 2 2 non-car 7:30 1
1 1 car 8:00 1 3 2 non-car 7:15 1
最佳答案
我们用as.POSIXct
将'start'转换为datetime类,按'household'分组,检查'start'的diff
ernece是否小于或等于 1,使用 as.integer
library(dplyr)
df1 %>%
mutate(start = as.POSIXct(start, format = '%H:%M')) %>%
group_by(household) %>%
mutate(indicator = as.integer(any(diff(start) <= 1)))
# A tibble: 4 x 5
# Groups: household [2]
# household person mode start indicator
# <int> <int> <chr> <dttm> <int>
#1 1 1 car 2019-09-03 08:00:00 1
#2 1 2 non-car 2019-09-03 07:30:00 1
#3 2 1 car 2019-09-03 07:00:00 0
#4 2 2 non-car 2019-09-03 09:00:00 0
要获得第二个输出,我们可以使用 tidyr
开发版本中的 pivot_wider
df1 %>%
mutate(startn = as.POSIXct(start, format = '%H:%M')) %>%
group_by(household) %>%
mutate(indicator = as.integer(any(diff(startn) <= 1))) %>%
filter(indicator == 1) %>%
select(-startn) %>%
group_by(household) %>%
mutate(n = row_number()) %>%
pivot_wider(names_from = n, values_from = c(household, person, mode, start, indicator))
# A tibble: 1 x 10
# household_1 household_2 person_1 person_2 mode_1 mode_2 start_1 start_2 indicator_1 indicator_2
# <int> <int> <int> <int> <chr> <chr> <chr> <chr> <int> <int>
#1 1 1 1 2 car non-car 8:00 7:30 1 1
数据
df1 <- structure(list(household = c(1L, 1L, 2L, 2L), person = c(1L,
2L, 1L, 2L), mode = c("car", "non-car", "car", "non-car"), start = c("8:00",
"7:30", "7:00", "9:00")), class = "data.frame", row.names = c(NA,
-4L))
关于r - 在一组中找到重叠时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57773866/