我有一个如下所示的数据集:
ID Cond Time1 Time2
1 2 Start Stop1
1 3 Start abc
1 1 abc Stop2
1 2 Start abc
1 2 abc Stop1
2 2 Start abc
2 4 abc jkl
2 3 abc jkl
2 2 abc jkl
2 3 abc Stop2
3 2 Start abc
3 3 abc Stop2
3 2 Start Stop1
3 3 Start Stop1
3 3 Start abc
3 2 abc jkl
3 4 baba Stop1
4 2 Start Stop2
4 1 Start asd
4 2 abc Stop2
我需要根据几个条件过滤数据。如果 Cond = 2
和 Time1 = Start
,我需要过滤直到第一个停止点(Stop1
或 Stop2
)。本质上,它应该是这样的:
ID Cond Time1 Time2
1 2 Start Stop1
1 2 Start abc
1 2 abc Stop1
2 2 Start abc
2 4 abc jkl
2 3 abc jkl
2 2 abc jkl
2 3 abc Stop2
3 2 Start abc
3 3 abc Stop2
3 2 Start Stop1
4 2 Start Stop2
此外,真实数据集有超过 140,000 个观察值,因此效率是关键。我正在考虑使用 dplyr
包,但不确定如何解决这个问题。
最佳答案
使用dplyr
dframe = read.table(header = T, text = "ID Cond Time1 Time2
1 2 Start Stop1
1 3 Start abc
1 1 abc Stop2
1 2 Start abc
1 2 abc Stop1
2 2 Start abc
2 4 abc jkl
2 3 abc jkl
2 2 abc jkl
2 3 abc Stop2
3 2 Start abc
3 3 abc Stop2
3 2 Start Stop1
3 3 Start Stop1
3 3 Start abc
3 2 abc jkl
3 4 baba Stop1
4 2 Start Stop2
4 1 Start asd
4 2 abc Stop2")
library(dplyr)
# add index
dframe = data.frame(index = 1:nrow(dframe), dframe)
head(dframe)
# get starting points
start_points = dframe %>%
filter(Cond == 2 & Time1 == 'Start') %>%
select(index, ID)
# get stopping points
stop_points = dframe %>%
filter(substr(Time2, 1, 4) == 'Stop') %>%
select(index, ID)
# get the stopping point associated with each start point
start_stop = start_points %>%
left_join(stop_points, by = "ID") %>%
filter(index.x <= index.y) %>%
group_by(ID, index.x) %>%
summarise(index.y = min(index.y)) %>%
ungroup() %>%
rename(start_index = index.x, stop_index = index.y)
# add rows between
result = start_stop %>%
left_join(dframe, by = "ID") %>%
filter(start_index <= index, index <= stop_index) %>%
select(-c(start_index, stop_index, index))
> result
Source: local data frame [12 x 4]
ID Cond Time1 Time2
(int) (int) (fctr) (fctr)
1 1 2 Start Stop1
2 1 2 Start abc
3 1 2 abc Stop1
4 2 2 Start abc
5 2 4 abc jkl
6 2 3 abc jkl
7 2 2 abc jkl
8 2 3 abc Stop2
9 3 2 Start abc
10 3 3 abc Stop2
11 3 2 Start Stop1
12 4 2 Start Stop2
关于r - 在起点和终点之间过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37947180/