假设我有一个数据框,其中包含开始时间列和结束时间列、测量列和测量时间列,如下所示:
start end value time
9:01:00 9:02:00 30.6 2013-03-25 9:05:00
9:01:00 9:02:00 30.8 2013-03-25 9:15:00
9:46:00 9:46:00 28.2 2013-03-25 9:43:00
9:46:00 9:46:00 28.9 2013-03-25 9:53:00
10:54:00 10:59:00 13.4 2013-03-25 10:56:00
10:54:00 10:59:00 13.8 2013-03-25 11:56:00
如何对此数据帧进行子集化,使其仅包含时间列在开始时间和结束时间内或开始时间前十分钟和结束时间后十分钟内的行。我任意选择十分钟,并且想知道如何在开始时间和结束时间之前和之后的任意时间内执行此操作。
生成的数据框如下:
start end value time
9:01:00 9:02:00 30.6 2013-03-25 9:05:00
9:46:00 9:46:00 28.2 2013-03-25 9:43:00
9:46:00 9:46:00 28.9 2013-03-25 9:53:00
10:54:00 10:59:00 13.4 2013-03-25 10:56:00
除了从开始/结束列条目中减去/添加 x 分钟数,然后根据时间列是否落在这些扩展窗口之间进行子集化之外,还有其他方法可以做到这一点吗?
目前,我已将时间列转换为 POSIXlt 格式。不幸的是,这将今天的日期提供给开始和结束列中的时间。
这是第一个数据帧的输出:
structure(list(start = structure(list(sec = c(0, 0, 0, 0, 0,
0), min = c(1L, 1L, 46L, 46L, 54L, 54L), hour = c(9L, 9L, 9L,
9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L,
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L,
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), end = structure(list(sec = c(0,
0, 0, 0, 0, 0), min = c(2L, 2L, 46L, 46L, 59L, 59L), hour = c(9L,
9L, 9L, 9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L,
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L,
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), value = c(30.6, 30.8, 28.2,
28.9, 13.4, 13.8), time = structure(list(sec = c(0, 0, 0, 0,
0, 0), min = c(5L, 15L, 43L, 53L, 56L, 56L), hour = c(9L, 9L,
9L, 9L, 10L, 11L), mday = c(25L, 25L, 25L, 25L, 25L, 25L), mon = c(2L,
2L, 2L, 2L, 2L, 2L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(1L, 1L, 1L, 1L, 1L, 1L), yday = c(83L, 83L, 83L,
83L, 83L, 83L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"))), .Names = c("start", "end",
"value", "time"), row.names = c(NA, -6L), class = "data.frame")
这是第二个数据帧的输出
structure(list(start = structure(list(sec = c(0, 0, 0, 0), min = c(1L,
46L, 46L, 54L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L,
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L),
isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour",
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt")), end = structure(list(sec = c(0, 0, 0, 0), min = c(2L,
46L, 46L, 59L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L,
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L),
isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour",
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt")), value = c(30.6, 28.2, 28.9, 13.4), time = structure(list(
sec = c(0, 0, 0, 0), min = c(5L, 43L, 53L, 56L), hour = c(9L,
9L, 9L, 10L), mday = c(25L, 25L, 25L, 25L), mon = c(2L, 2L,
2L, 2L), year = c(113L, 113L, 113L, 113L), wday = c(1L, 1L,
1L, 1L), yday = c(83L, 83L, 83L, 83L), isdst = c(1L, 1L,
1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"
))), .Names = c("start", "end", "value", "time"), row.names = c(NA,
-4L), class = "data.frame")
最佳答案
重新创建没有乐趣,但答案应该很简单:
data[with(data, time > start - 10*60 & time < end + 10*60),]
假设 start
、end
和 time
对象实际上都是可比较的(即相应的年份和日期) - 否则只需转换与 POSIX 一天中的时间相对应的子字符串。
更新:好的,由于您的日期已关闭,您需要重新创建它们以“同步”,例如:
data$start <- as.POSIXct(substr(data$start,12,19), format="%H:%M:%S")
data$end <- as.POSIXct(substr(data$end,12,19), format="%H:%M:%S")
data$time <- as.POSIXct(substr(data$time,12,19), format="%H:%M:%S")
现在,上面的行给出了您想要的内容。也许,您应该小心如何从原始数据中编码 POSIX。此外,对于大多数应用程序,POSIXct 可能比 POSIXlt 更受青睐 - 其中每个元素都是一个列表。这可能会阻碍(或减慢)后续的一些操作。
关于R:根据观察窗口一定分钟数内的时间对数据框进行子集化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18092081/