带 stringr 的正则表达式::如何查找模式的第一个实例

这个问题的背后是提取由 knitr 和 latex 创建的所有引用的努力。没有找到其他方法，我的想法是读入 R 的 .Rnw 脚本并使用正则表达式来查找引用 - 其中 latex 语法是 \ref{caption referenced to}。我的脚本有 250 多个引用文献，其中一些彼此非常接近。

下面的 text.1 示例有效，但 text 示例无效。我认为这与 R 一直到最后的右大括号有关。如何在第一个右大括号处停止并将其前面的内容提取到左大括号？

library(stringr)
text.1 <- c(" \\ref{test}", "abc", "\\ref{test2}", " \\section{test3}", "{test3")
# In the regular expression below, look back and if find "ref{", grab everything until look behind for } at end
# braces are special characters and require escaping with double backslacs for R to recognize them as braces
# unlist converts the list returned by str_extract to a vector

unlist(str_extract_all(string = text.1, pattern = "(?<=ref\\{).*(?=\\}$)"))
[1] "test"  "test2"

# a more complicated string, with more than one set of braces in an element
text <- c("text \ref{?bar labels precision} and more text  \ref{?table column alignment}", "text \ref{?table space} }")

unlist(str_extract_all(string = text, pattern = "(?<=ref\\{).*(?=\\}$)"))
character(0)

最佳答案

text 的问题是“ref”前面的反斜杠被引擎和 R 的解析器解释为回车符 \r；所以你试图匹配“ref”，但它实际上是 (CR + "ef") ...

此外，* 默认情况下是贪婪，这意味着它将尽可能匹配，并且仍然允许正则表达式的其余部分匹配。使用 *? 或否定字符类来防止贪婪。

unlist(str_extract_all(text, '(?<=\ref\\{)[^}]*'))
# [1] "?bar labels precision"   "?table column alignment" "?table space"

如您所见，您可以使用字符类来匹配 (\r 或 r + "ef") .. .

x <- c(' \\ref{test}', 'abc', '\\ref{test2}', ' \\section{test3}', '{test3',
       'text \ref{?bar labels precision} and more text  \ref{?table column alignment}', 
       'text \ref{?table space} }')

unlist(str_extract_all(x, '(?<=[\rr]ef\\{)[^}]*'))

# [1] "test"                    "test2"                   "?bar labels precision"  
# [4] "?table column alignment" "?table space"

关于带 stringr 的正则表达式::如何查找模式的第一个实例，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32811800/

带 stringr 的正则表达式::如何查找模式的第一个实例

上一篇：sql - 通过用户名选择用户，然后使用其 UID 从另一个表中选择数据

下一篇：load-balancing - Cloud Foundry/Bluemix 负载平衡