regex - php preg_match_all 等效

我正在寻找与 PHP 的 preg_match_all 函数等效的 R。

目标:

在单个字符串(不是多个字符串的向量)中搜索正则表达式模式
返回匹配矩阵

示例:

假设以下没有定界的扁平字符串。

"This is a sample string written like a paragraph. In this string two sets of information exist. Each set contains two variables. We want to extract the sets and variables within those sets. Each information set is formatted the same way. The first set is Title: Sir; Last Name: John; and the second set is Title: Mr.; Last Name: Smith."

使用类似于

的正则表达式模式

"Title: ([^;]*?); Last Name: ([^;.]*?)"

我想从上面的字符串生成以下矩阵:

[  ][,1]  [,2]
[1,] Sir  John
[2,] Mr.  Smith

我已经使用 preg_match_all 函数在远程服务器上的 PHP 中成功完成了这项工作；但是，我正在访问的文本文件相对较大(不是很大但上传速度很慢)。在 R 中构建它可以节省大量时间。

我已经阅读了 R 中 grep 等的使用，但我发现的每个示例都在向量中搜索模式，并且我无法生成上述矩阵。

我也玩过 stringr 包，但我还是没有成功生成矩阵。

这对我来说似乎是一项常见的任务，所以我相信比我更聪明的人之前已经找到了解决方案。

最佳答案

使用 regmatches 考虑以下选项:

x <- 'This is a sample string written like a paragraph. In this string two sets of information exist. Each set contains two variables. We want to extract the sets and variables within those sets. Each information set is formatted the same way. The first set is Title: Sir; Last Name: John; and the second set is Title: Mr.; Last Name: Smith.'
m <- regmatches(x, gregexpr('(?i)Title: \\K[^;]+|Last Name: \\K[^;.]+', x, perl=T))
matrix(unlist(m), ncol=2, byrow=T)

输出:

     [,1]  [,2]   
[1,] "Sir" "John" 
[2,] "Mr." "Smith"

关于regex - php preg_match_all 等效，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24418559/

regex - php preg_match_all 等效

上一篇：ajax - 如何ajax更新PrimeFaces数据表页脚中的项目？

下一篇：scala - SBT:列出项目依赖库