r - 如何在检测到特定字符串后使用过滤器和 dplyr 删除数据帧行

我有如下例所示的数据。对于每个参与者，如果特定字符串 ("trial_end") 出现在 my_strings 列中，我想在它出现后删除所有行。

library(dplyr)
library(stringr)
library(tibble)

df1 <- tibble::tribble(
  ~participant_id, ~timestamp,     ~my_strings,
  1L,        1L,  "other_string",
  1L,        2L,  "other_string",
  1L,        3L, "trial_end",
  1L,        4L,  "other_string",
  2L,        1L,  "other_string",
  2L,        2L,  "other_string",
  2L,        3L,  "other_string",
  2L,        4L,  "other_string",
  3L,        1L,  "other_string",
  3L,        2L, "trial_end",
  3L,        3L,  "other_string",
  3L,        4L,  "other_string"
)

我的第一个尝试是使用 str_detect 来寻找字符串的存在，which 来提供行号，然后使用 filter只保留那一行和它之前的所有行:

df2 <- df1 %>% 
  group_by(participant_id) %>%
        filter(row_number() < (which(str_detect(my_strings, "trial_end"))) + 1)

当未检测到字符串时，这似乎会引发错误(例如此处示例中的参与者 2)。

我的下一次尝试是添加一个条件 if_else，试图有效地说“如果检测到目标字符串，则删除该参与者之后的所有行，否则，如果未检测到该字符串，则执行什么都没有。

df3 <- df1 %>% 
  group_by(participant_id) %>%
  if_else(str_detect(my_strings, "trial_end"),
        filter(row_number() < (which(str_detect(my_strings, "trial_end"))) + 1),
        filter(timestamp < max(timestamp)))

这也返回了一个错误: 错误:condition 必须是逻辑向量，而不是 grouped_df/tbl_df/tbl/data.frame 对象。

我最后一次尝试通过将条件 if else 放在 filter 中来尝试使用此处已有的另一个答案，但这也产生了错误。

df4 <- df1 %>% 
  group_by(participant_id) %>%
  filter(if(str_detect(my_strings, "trial_end") < (which(str_detect(my_strings, "trial_end")) + 1)) 
            else < n())

谁能指出解决这个问题的最佳方法？ filter 是不是错误的处理方式？

非常感谢。

为清楚起见，期望的结果如下所示:

desired_output <- tibble::tribble(
                    ~participant_id, ~timestamp,    ~my_strings,
                                 1L,         1L, "other_string",
                                 1L,         2L, "other_string",
                                 1L,         3L,    "trial_end",
                                 2L,         1L, "other_string",
                                 2L,         2L, "other_string",
                                 2L,         3L, "other_string",
                                 2L,         4L, "other_string",
                                 3L,         1L, "other_string",
                                 3L,         2L,    "trial_end"
                    )

最佳答案

一个选项可能是:

df1 %>%
    group_by(participant_id) %>%
    slice(if(all(my_strings != "trial_end")) 1:n() else 1:which(my_strings == "trial_end"))

  participant_id timestamp my_strings  
           <int>     <int> <chr>       
1              1         1 other_string
2              1         2 other_string
3              1         3 trial_end   
4              2         1 other_string
5              2         2 other_string
6              2         3 other_string
7              2         4 other_string
8              3         1 other_string
9              3         2 trial_end

关于r - 如何在检测到特定字符串后使用过滤器和 dplyr 删除数据帧行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66527812/

r - 如何在检测到特定字符串后使用过滤器和 dplyr 删除数据帧行

上一篇：vue.js - Storybook 在 Show Code 中显示所有内容

下一篇：java - 为什么以下异常专门显示 -4 索引而不是 -1？