考虑这个简单的例子
library(stringr)
library(dplyr)
dataframe <- data_frame(text = c('how is the biggest ??',
'really amazing stuff'))
# A tibble: 2 x 1
text
<chr>
1 how is the biggest ??
2 really amazing stuff
我需要基于
regex
表达式提取一些术语,但仅提取最长的术语。到目前为止,我只能使用
str_extract
提取第一个匹配项(不需要最长的匹配项)。> dataframe %>% mutate(mymatch = str_extract(text, regex('\\w+')))
# A tibble: 2 x 2
text mymatch
<chr> <chr>
1 how is the biggest ?? how
2 really amazing stuff really
我尝试使用
str_extract_all
,但找不到有效的语法。输出应为:
# A tibble: 2 x 2
text mymatch
<chr> <chr>
1 how is the biggest ?? biggest
2 really amazing stuff amazing
有任何想法吗?
谢谢!
最佳答案
您可以执行以下操作:
library(stringr)
library(dplyr)
dataframe %>%
mutate(mymatch = sapply(str_extract_all(text, '\\w+'),
function(x) x[nchar(x) == max(nchar(x))][1]))
使用
purrr
:library(purrr)
dataframe %>%
mutate(mymatch = map_chr(str_extract_all(text, '\\w+'),
~ .[nchar(.) == max(nchar(.))][1]))
结果:
# A tibble: 2 x 2
text mymatch
<chr> <chr>
1 how is the biggest ?? biggest
2 really amazing stuff amazing
注意:
如果有平局,则采用第一个。
数据:
dataframe <- data_frame(text = c('how is the biggest ??',
'really amazing biggest stuff'))
关于r - 如何提取最长的匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50453844/