假设我有一个数据表:
dt <- data.table(
datetime = seq(as.POSIXct("2016-01-01 00:00:00"),as.POSIXct("2016-01-01 10:00:00"), by = "1 hour"),
ObType = c("A","A","B","B","B","B","A","A","B","A","A")
)
dt
datetime ObType
1: 2016-01-01 00:00:00 A
2: 2016-01-01 01:00:00 A
3: 2016-01-01 02:00:00 B
4: 2016-01-01 03:00:00 B
5: 2016-01-01 04:00:00 B
6: 2016-01-01 05:00:00 B
7: 2016-01-01 06:00:00 A
8: 2016-01-01 07:00:00 A
9: 2016-01-01 08:00:00 B
10: 2016-01-01 09:00:00 A
11: 2016-01-01 10:00:00 A
我需要做的是在 ObType 为“B”的任何地方,我都需要找到到任一侧最近的 ObType“A”的时间。所以结果应该是这样的(以小时为单位):
datetime ObType timeLag timeLead
1: 2016-01-01 00:00:00 A NA NA
2: 2016-01-01 01:00:00 A NA NA
3: 2016-01-01 02:00:00 B 1 4
4: 2016-01-01 03:00:00 B 2 3
5: 2016-01-01 04:00:00 B 3 2
6: 2016-01-01 05:00:00 B 4 1
7: 2016-01-01 06:00:00 A NA NA
8: 2016-01-01 07:00:00 A NA NA
9: 2016-01-01 08:00:00 B 1 1
10: 2016-01-01 09:00:00 A NA NA
11: 2016-01-01 10:00:00 A NA NA
我通常使用 data.table,但非 data.table 解决方案也很好。
谢谢!
赖斯
最佳答案
我暗示使用 roll=
的方法:
X = dt[ObType=="A"]
X
datetime ObType
1: 2016-01-01 00:00:00 A
2: 2016-01-01 01:00:00 A
3: 2016-01-01 06:00:00 A
4: 2016-01-01 07:00:00 A
5: 2016-01-01 09:00:00 A
6: 2016-01-01 10:00:00 A
dt[ObType=="B", Lag:=X[.SD,on="datetime",roll=Inf,i.datetime-x.datetime]]
dt[ObType=="B", Lead:=X[.SD,on="datetime",roll=-Inf,x.datetime-i.datetime]]
dt[ObType=="B", Nearest:=X[.SD,on="datetime",roll="nearest",x.datetime-i.datetime]]
dt
datetime ObType Lag Lead Nearest
1: 2016-01-01 00:00:00 A NA hours NA hours NA hours
2: 2016-01-01 01:00:00 A NA hours NA hours NA hours
3: 2016-01-01 02:00:00 B 1 hours 4 hours -1 hours
4: 2016-01-01 03:00:00 B 2 hours 3 hours -2 hours
5: 2016-01-01 04:00:00 B 3 hours 2 hours 2 hours
6: 2016-01-01 05:00:00 B 4 hours 1 hours 1 hours
7: 2016-01-01 06:00:00 A NA hours NA hours NA hours
8: 2016-01-01 07:00:00 A NA hours NA hours NA hours
9: 2016-01-01 08:00:00 B 1 hours 1 hours -1 hours
10: 2016-01-01 09:00:00 A NA hours NA hours NA hours
11: 2016-01-01 10:00:00 A NA hours NA hours NA hours
roll=
的优势之一是您可以通过更改 Inf
来应用陈旧限制。到您希望加入的时间限制。这是限制适用的时间差,而不是行数。 Inf
只是意味着不限制。 roll=
符号表示是向前看还是向后看(领先或滞后)。另一个优点是
roll=
很快。
关于r - 查找每行最接近特定值的时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42379658/