r - 如何使用 R 删除字符串中特定单词之前和之后的单词?

标签 r gsub

我有以下 df:

structure(list(id = c(9L, 10L, 11L, 96L, 97L, 101L, 103L, 248L, 
499L, 1044L), leg_activity = c("home, adpt, shop, car_passenger, home, adpt, work, adpt, home pt,, work pt,, outside, outside, outside pt,, outside pt,, pt, home", 
"home pt,, pt, outside, outside, outside, outside pt,, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home", 
"home pt,, work, adpt, home", "home, car, work, car, home pt,, work, adpt, home", 
"home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, outside, outside, car_passenger, outside, outside, outside, car_passenger, home", 
"home, bike, outside, outside, outside, car_passenger, outside, outside, outside, car_passenger, outside, outside, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home", 
"home, adpt, work, adpt, home, walk, other, pt, home", "home, adpt, work, walk, home, adpt, work, walk, home", 
"home, adpt, leisure, adpt, home, bike, outside, outside, outside, bike, home", 
"home, pt, work, adpt, home, adpt, work, adpt, home")), row.names = c(NA, 
10L), class = "data.frame")

如您所见,leg_activity 列包含字符串。我想要的是删除所有与 outside 相关的单词。

更具体一点,让我们以假设的行为例:

"home, bike, outside, outside, outside, car_passenger, outside, outside,  bike, home, adpt, bike, leisure, bike, home"

目标是删除 outside 之前的单词以及 outside 之后的单词,最终,outside 应该被删除也。所需的输出:

"home, home, adpt, bike, leisure, bike, home"

到目前为止我只能删除特定的单词

agents$leg_activity <- gsub(', home', '', agents$leg_activity)

非常感谢您的帮助!

最佳答案

我们可以用逗号分割字符串,使用 grep 获取 "outside" 所在的位置,并删除它之前和之后的值。

agents$new_col <- sapply(strsplit(agents$leg_activity, ',{1,}\\s'), function(x) {
              inds <-  grep('outside', x)
              if(length(inds)) toString(x[-unique(c(inds - 1, inds, inds + 1))])
              else toString(x)
})
agents$new_col

# [1] "home, adpt, shop, car_passenger, home, adpt, work, adpt, home pt, home"                                                                                       
# [2] "home pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home"
# [3] "home pt, work, adpt, home"                                                                                                                                    
# [4] "home, car, work, car, home pt, work, adpt, home"                                                                                                              
# [5] "home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, home"                                                                              
# [6] "home, home, adpt, leisure, adpt, home, bike, leisure, bike, home"                                                                                             
# [7] "home, adpt, work, adpt, home, walk, other, pt, home"                                                                                                          
# [8] "home, adpt, work, walk, home, adpt, work, walk, home"                                                                                                         
# [9] "home, adpt, leisure, adpt, home, home"                                                                                                                        
#[10] "home, pt, work, adpt, home, adpt, work, adpt, home"  

关于r - 如何使用 R 删除字符串中特定单词之前和之后的单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62245449/

相关文章:

linux - 如何通过awk命令将以>开头的行替换为同一行的第15列?

regex - 在 R 正则表达式中逐个匹配单词

r - 带有重叠标签的 geom_text()

R min 和 max 函数不适用于日期

r - 使用 col 的值从另一个 col 中选择值,放入 R 中的新 df

regex - 如何替换字符串中的单/双字符

ruby - 如何使用 Ruby 从包含撇号的字符串创建文件夹?

r - 如何在 R 中设置一个包含自身的类(对于树)?

r - ggplot2 中的重叠线