php - 查找最后一次出现的 href

我正在尝试使用正则表达式查找一个链接，该链接出现在下面 HTML 中的 textABCXYZ123 字符串之前。

lorem ispum...<strong><a href="http://www.site.com/link/123">FIRSTlink</a> </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>ANOTHERLINK</span>.
... more text........... more text........
... more text.......<strong><a href="http://www.site.com/link/123">other link</a> </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>ANOTHERLINK</span>.
... more text........... more text........
<strong><a href="http://www.IneedThis.com/link/123">somewhere to go</a> </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>textABCXYZ123</span>
...
... more text..........<strong><a href="http://www.site.com/link/123">other link</a> </strong><br>
1 points| Saved Jan 08, 2014 at 00:49 <span class=notes_box>ANOTHERLINK</span>.
... more text........... more text........

有很多链接，我需要捕获出现在 textABCXYZ123 字符串之前的链接。我尝试了下面的正则表达式，但它返回了我第一个链接而不是最后一个:

$find_string = 'ABCXYZ123';
preg_match('#href="(.*)".*text'.$find_string.'#sU',$html,$match);
// so final resutl is "http://www.site.com/link/123" which is first link

有人可以指导我如何捕获字符串 textABCXYZ123 之前的链接吗？ P.S 我了解 xpath 和简单的 html dom，但我想与 regexp 匹配。感谢您的任何意见。

最佳答案

你也许可以尝试正则表达式:

href="([^"]*)">(?=(?:(?!href).)*textABCXYZ123)

像这样吗？

$find_string = 'ABCXYZ123';
preg_match('~href="([^"]*)">(?=(?:(?!href).)*text'.$find_string.')~sU',$html,$match);

regex101 demo

第一部分是 href="([^"]*)"> 应该不会太难理解。它匹配 href=" 然后任意数量的非引号字符，后跟引号和 >。

(?=(?:(?!href).)*textABCXYZ123) 首先是正向前瞻。 (正向前瞻的格式为 (?= ... ))它将确保内部有内容表明存在匹配。

例如a(?=.*b)匹配任何a，只要有任何字符，则匹配b a 之后的某处(也意味着只要后面有 b 就匹配 a)。

因此，仅当某处存在 (?:(?!href).)*textABCXYZ123 时， href="([^"]*)"> 才会匹配领先。

(?:(?!href).)* 是修改后的 .*，因为负向前瞻(格式 (?! ... ) ) 确保没有 href 匹配。您可以说这与积极的前瞻相反:

a(?!.*b) 匹配任何 a，只要它不后跟 b.

关于php - 查找最后一次出现的 href，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21001786/

php - 查找最后一次出现的 href

上一篇：c - 内存泄漏 - g_strndup

下一篇：gruntjs - grunt.file.copy 排除空文件夹