我有时间序列横截面数据集。在 value
列中,在某些 FALSE 值之后该值将变为 TRUE。我想过滤数据集以保留所有 TRUE 值和之前的 4 个 FALSE 值。
示例数据集和所需数据集如下:
df = tibble(country = c("A","A","A","A","A","A","A",
"B","B","B","B","B","B","B",
"C", "C", "C", "C", "C", "C", "C",
"D", "D", "D", "D", "D", "D", "D"),
year = c("2010", "2011", "2012", "2013", "2014","2015", "2016",
"2010", "2011", "2012", "2013", "2014","2015", "2016",
"2010", "2011", "2012", "2013", "2014","2015", "2016",
"2010", "2011", "2012", "2013", "2014","2015", "2016"),
value = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE))
outcome = tibble(country = c("A","A","A","A","A",
"B","B","B","B","B",
"C", "C", "C", "C", "C",
"D", "D", "D", "D", "D"),
year = c("2011", "2012", "2013", "2014","2015",
"2011", "2012", "2013", "2014","2015",
"2011", "2012", "2013", "2014","2015",
"2011", "2012", "2013", "2014","2015"),
value = c(FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, TRUE))
最佳答案
使用“反向”累积和:
df %>%
group_by(country,cum=rev(cumsum(rev(value)))) %>%
filter(n()==5) %>%
ungroup%>%
select(-cum)
# A tibble: 20 x 3
country year value
<chr> <chr> <lgl>
1 A 2011 FALSE
2 A 2012 FALSE
3 A 2013 FALSE
4 A 2014 FALSE
5 A 2015 TRUE
6 B 2011 FALSE
7 B 2012 FALSE
8 B 2013 FALSE
9 B 2014 FALSE
10 B 2015 TRUE
11 C 2011 FALSE
12 C 2012 FALSE
13 C 2013 FALSE
14 C 2014 FALSE
15 C 2015 TRUE
16 D 2011 FALSE
17 D 2012 FALSE
18 D 2013 FALSE
19 D 2014 FALSE
20 D 2015 TRUE
关于r - 使用另一个条件过滤某些值和多个先前行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76009081/