我有一个名为 cars
的字符串,如下所示:
cars
[1] "Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair"
[2] "Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition."
我需要从字符串中提取以下部分:
car(52;model-14557)
car(21, model-155)
car ( 36, model-8878)
我尝试使用以下部分来提取它:
stringr::str_extract_all(cars, "(.car\\s{0,5}\\(([^]]+)\\))")
这给了我以下输出:
[[1]]
[1] " car(52;model-14557) had a good engine(workable condition)"
[[2]]
[1] " car(21, model-155) looked in good condition but car ( 36, model-8878)"
有没有办法可以提取汽车一词以及相关的编号和型号?
最佳答案
Your regex does not work因为您使用的是 [^]]+
,除 ]
之外的一个或多个与 (
和 )
匹配的符号,因此从第一个 (
到最后一个 )
匹配,中间没有 ]
。
使用
> cars <- c("Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair","Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition.")
> library(stringr)
> str_extract_all(cars, "\\bcar\\s*\\([^()]+\\)")
[[1]]
[1] "car(52;model-14557)"
[[2]]
[1] "car(21, model-155)" "car ( 36, model-8878)"
正则表达式为\bcar\s*\([^()]+\)
,请参阅online regex demo here .
匹配:
\b
- 单词边界car
- 文字字符序列\s*
- 0+ 个空格\(
- 文字(
[^()]+
- 除(
和)之外的 1 个或多个字符
\)
- 文字)
。
请注意,使用以下基本 R 代码,相同的正则表达式将产生相同的结果:
> regmatches(cars, gregexpr("\\bcar\\s*\\([^()]+\\)", cars))
[[1]]
[1] "car(52;model-14557)"
[[2]]
[1] "car(21, model-155)" "car ( 36, model-8878)"
关于正则表达式直到括号第一次出现为止,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42646882/