r - 过滤行的条件是必须至少存在两个不同的关键字

我有一个包含语音数据的数据框，如下所示:

df <- data.frame(
  id = 1:12,
  partcl = c("yeah yeah yeah absolutely", "well you know it 's", "oh well yeah that's right", 
             "yeah I mean well oh", "well erm well Peter will be there", "well yeah well", 
             "yes yes yes totally", "yeah yeah yeah yeah", "well well I did n't do it", 
             "er well yeah that 's true", "oh hey where 's he gone?", "er"
))

和一个带有关键字parts的向量:

parts <- c("yeah", "oh", "no", "well", "mm", "yes", "so", "right", "er", "like")

我需要做的是过滤至少具有两个不同 parts 值的行。我可以做的是过滤至少具有两个 parts 值的行，无论它们是不同的还是相同的:

library(dplyr)   
df %>%
  filter(
    str_count(partcl, paste0("\\b(", paste0(parts, collapse = "|"), ")\\b")) > 1
  )
  id                            partcl
1  1         yeah yeah yeah absolutely
2  3         oh well yeah that's right
3  4               yeah I mean well oh
4  5 well erm well Peter will be there
5  6                    well yeah well
6  7               yes yes yes totally
7  8               yeah yeah yeah yeah
8  9         well well I did n't do it
9 10         er well yeah that 's true

我如何断言匹配的部分是不同的，以便结果是这样的:

  id                            partcl
1  3         oh well yeah that's right
2  4               yeah I mean well oh
3  6                    well yeah well
4 10         er well yeah that 's true

最佳答案

这可能会有所帮助 - 使用 str_extract_all 提取关键字，然后使用 n_distinct 进行检查以过滤 具有多个的行独特的关键字

library(dplyr)
library(stringr)
library(purrr)
df %>% 
  filter(map_lgl(str_extract_all(partcl, 
    paste0("\\b(", paste0(parts, collapse = "|"), ")\\b")), 
    ~  n_distinct(.x) > 1))

-输出

 id                    partcl
1  3 oh well yeah that's right
2  4       yeah I mean well oh
3  6            well yeah well
4 10 er well yeah that 's true

关于r - 过滤行的条件是必须至少存在两个不同的关键字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/71053784/

r - 过滤行的条件是必须至少存在两个不同的关键字

上一篇：firebase - 在 Firebase 中编写批处理时，如何添加具有自动生成 ID 的文档引用？

下一篇：python - 尝试使用 python 镜像安装 Fiona 时 Docker 构建失败