我有一个格式如下所示的数据框:
String Keyword
1 Apples bananas mangoes mangoes
2 Apples bananas mangoes bananas
3 Apples bananas mangoes peach
.....
它是一个数据框(50000 多行)。我目前正在批量手动使用ifelse语句。
data$Result<- ifelse(grepl("apples",data$String,ignore.case = TRUE)==TRUE,"apples",
ifelse(grepl("bananas",data$String,ignore.case = TRUE)==TRUE,"bananas",
ifelse(grepl("mangoes",data$String,ignore.case = TRUE)==TRUE,"mangoes","unavailable")))
String Keyword Result
Apples bananas mangoes mangoes mangoes
Apples bananas mangoes bananas bananas
Apples bananas mangoes peach unavailable
有没有办法,我可以将字符串和关键字存储在一个列表中,然后对整个列表应用 grepl?
最佳答案
这是一个结合了data.table
和stringi
包的简单高效的解决方案:
library(data.table)
library(stringi)
setDT(df)[stri_detect_fixed(String, Keyword, case_insensitive = TRUE), result := Keyword]
# String Keyword result
# 1: Apples bananas mangoes mangoes mangoes
# 2: Apples bananas mangoes bananas bananas
# 3: Apples bananas mangoes peach NA
或者,data.table
-only 版本:
library(data.table)
setDT(df)[, result := Keyword[grep(Keyword, String, ignore.case = TRUE)], by = .(Keyword, String)]
基准
这是针对 mapply
答案的 5e5
数据集的基准测试。 (for
循环答案还没有运行完):
set.seed(123)
df1 <- data.frame(String = rep('Apples bananas mangoes', 5e5),
Keyword = sample(c("mangoes", "bananas", "peach"), 5e5, replace = TRUE))
system.time(df1$result2 <- ifelse(mapply(grepl,df1$Keyword, df1$String, ignore.case = TRUE), as.character(df1$Keyword), "Unavailable"))
# user system elapsed
# 40.78 0.02 41.12
system.time(setDT(df1)[stri_detect_fixed(String, Keyword, case_insensitive = TRUE), result3 := Keyword])
# user system elapsed
# 0.52 0.01 0.53
关于r - 通过将列保存在列表中来跨列应用 grepl?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31431280/