我有一组需要操作的字符串。对于每个字符串,如果它们包含一组子字符串,我想保留该子字符串,否则保持不变。
下面是一个示例:
keep <- c("USA","UNITED STATES")
keep <- paste0(paste0(" ",keep,"$"),collapse="|")
data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
expected_result <- c("DETROIT","USA","UNITED STATES")
最佳答案
你可以使用
data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
keep <- c("USA","UNITED STATES")
regex <- paste0(".*\\s*\\b(",paste0(keep,collapse="|"), ")\\b")
sub(regex, "\\1", data)
## => [1] "DETROIT" "USA" "UNITED STATES"
请参阅R demo online .
正则表达式为.*\s*\b(USA|UNITED STATES)\b
,参见its online demo .
详细信息:
.*
- 任意零个或多个尽可能多的字符\s*
- 零个或多个空格\b(USA|UNITED STATES)\b
- 整个单词USA
或UNITED STATES
,捕获到第 1 组 (\1
在替换模式中)。
关于删除子文本数组之前的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66222589/