有一个像这样的 df:
df_in <- data.frame(x = c('x1','x2','x3','x4'),
col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
col2 = c('https://google.com', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'),
col3 = c('http://www.bbcnews.com?id=321', 'http://google.com?id=1234','NA','https://bbcnews.com/search'),
col4 = c('NA', 'https://www.youtube/com','NA', 'www.youtube.com/searcht'))
在 col1、col2 和 col3 中,如何只保留其中包含“google”或“youtube”或“bbc”的单元格,否则会使单元格不可用?
预期输出示例:
x col1 col2 col3 col4
1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321 NA
2 x2 NA http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
3 x3 NA NA NA NA
4 x4 NA https://google.com/text https://bbcnews.com/search www.youtube.com/searcht
最佳答案
我们可以使用 mutate_at
将列 'col1' 更改为 'col4',使用 str_detect
检查它是否包含 'google' 或 'youtube' 或 'bbc'并将其他元素替换为 NA
library(dplyr)
library(stringr)
df_in %>%
mutate_at(vars(col1:col4), funs(ifelse(str_detect(.,
"google|youtube|bbc"), as.character(.), NA)))
-输出
# x col1 col2 col3 col4
# 1 x1 http://youtube.com/something https://google.com http://www.bbcnews.com?id=321 <NA>
# 2 x2 <NA> http://www.bbcnews2.com?id=321 http://google.com?id=1234 https://www.youtube/com
# 3 x3 <NA> <NA> <NA> <NA>
# 4 x4 <NA> https://google.com/text https://bbcnews.com/search www.youtube.com/searcht
关于r - 仅保留字符串中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48907934/