我有一个包含使用该服务的开始时间和结束时间的数据集。总共大约1000行。 我需要计算任何给定时间间隔内的小时数。
数据集:
court_number start_time end_time service 1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching
library(data.table)
dt <- data.table(court_number = c('court 2','court 3','court 1','court 5','court 5'),
start_time = c('2020-03-01 11:00:00', '2020-03-01 12:30:00', '2020-03-01 11:00:00', '2020-03-01 12:00:00', '2020-03-01 16:30:00'),
end_time = c('2020-03-01 12:30:00', '2020-03-01 13:30:00', '2020-03-01 13:00:00', '2020-03-01 16:00:00', '2020-03-01 18:30:00'),
service = c('booking','coaching','booking','booking','coaching'))
例如,我想计算从“12:00”到“17:00”之间的小时数。 因此,我需要创建一个时间为 12 点到 17 点的专栏:
court_number start_time end_time service interval_12_17 1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 0,5 2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 1 3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 1 4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 4 5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching 0,5
我在 stackoverflow 上读到了很多类似的已解决问题,但由于我是 R 新手,所以它对我没有帮助 =)
最佳答案
我们将“time”列转换为 Datetime
类,获取“end_time”的“Time”的 pmin
的 difftime
以及“17:00:00”以及“start_time”和“12:00:00”的 pmax
library(dplyr)
library(lubridate)
library(data.table)
dt %>%
mutate_at(vars(ends_with('time')), ymd_hms) %>%
mutate(interval_12_17 = difftime(pmin(as.ITime(end_time),
as.ITime("17:00:00")),
pmax(as.ITime(start_time),as.ITime("12:00:00")), unit = 'hour'))
# court_number start_time end_time service interval_12_17
#1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 0.5 hours
#2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 1.0 hours
#3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 1.0 hours
#4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 4.0 hours
#5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching 0.5 hours
或者使用data.table
dt[, interval_12_17 := difftime(pmin(as.ITime(end_time), as.ITime("17:00:00")),
pmax(as.ITime(start_time),as.ITime("12:00:00")), unit = 'hour')][]
# court_number start_time end_time service interval_12_17
#1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 0.5 hours
#2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 1.0 hours
#3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 1.0 hours
#4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 4.0 hours
#5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching 0.5 hours
关于r - 如何计算给定时间间隔内的小时数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62225823/