寻找一种快速且不冗长的解决方案来检查字符串是否包含给定单词向量的所有元素。我提出了一些想法,但感觉我错过了一些东西,特别是因为检查字符串是否包含任何单词有一个非常简洁的解决方案。
我尝试过的:
# Example data
strings <- c(
"never going to do this again",
"never again",
"will repeat",
"never repeat",
"again tomorrow"
)
# Words we are looking for
ourWords <- c("never", "again")
# Check if string contains any of our words
grepl(paste0(ourWords, collapse = "|"), strings, , fixed = TRUE)
# Very neat solution but **not** what I am looking for
# Check if string contains **all** of our words
grepl(ourWords[1], strings, fixed = TRUE) &
grepl(ourWords[2], strings, fixed = TRUE)
# This is verbose, not very scalable, and seems inefficient
# Even less efficient alternative
vapply(
strsplit(strings, split = " "),
function(x) sum(ourWords %in% x) == length(ourWords),
logical(1)
)
最佳答案
您可以结合使用 sprintf
和多个前瞻:
strings <- c(
"never going to do this again",
"never again",
"will repeat",
"never repeat",
"again tomorrow"
)
ourWords <- c("never", "again")
regex <- paste0(sprintf("(?=.*%s)", ourWords), collapse = '')
strings[grepl(regex, strings, perl = TRUE)]
在这种情况下的产量
[1] "never going to do this again" "never again"
这里的想法是使用多个前瞻。
关于R grepl 检查字符串是否包含我们所有的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48949006/