php - 在 PHP 中使用 DOMDocument 的正则表达式

标签 php regex domdocument

考虑以下 PHP 片段:

<?php

$html = <<<DATA
<p>Lorem Ipsum is simply dummy text</p> <p>Lorem Ipsum is <a href="http://www.google.com">simply</a> dummy text</p><a href="http://www.youtube.com/watch?v=DUQi_R4SgWo" target="_blank" rel="noopener">Check out the video here!</a>. <p>Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p> <a href="http://www.youtube.com/watch?v=A_6gNZCkajU" target="_blank" rel="noopener">Video here</a> <p>It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
DATA;

# set up the DOM
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

# set up the xpath
$xpath = new DOMXPath($dom);

# set up the regex
$regex = '~\?v=([^&]+)~';

foreach ($xpath->query("a[contains(@href, 'youtube')]/@href") as $link) {
    preg_match($regex, $link->nodeValue, $matches);
    if ($matches) {
        $id = $matches[1];
        echo "$id\n";
    }
}
?>

这会在 HTML 字符串上设置 DOM 并借助 xpath 获取 YouTube 链接> 之后查询和正则表达式。
片段产量

DUQi_R4SgWo
A_6gNZCkajU


现在,我想将 foreach 循环替换为:

$regex = '~\?v=([^&]+)~';

$xpath->registerPHPFunctions();
$xpath->registerNamespace("php", "http://php.net/xpath");
$links = $xpath->query("a[php:functionString('preg_match', '$regex', href, '$matches')]/@href");

这会找到相同的链接,但不会将任何内容保存到 $matches - 为什么?

最佳答案

快速扫描 underlying engine code : 它不支持按引用传递。

要解决这个问题,请使用您自己的包装器:

$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPHPFunctions('match');
$links = $xpath->query("a[php:functionString('match', @href)]/@href");

function match($href) {
    $regex = '~\?v=([^&]+)~';
    $rc = preg_match($regex, $href, $matches);
    var_dump($matches[1]); // store this somewhere
    return $rc;
}

See it live on 3v4l.org.

关于php - 在 PHP 中使用 DOMDocument 的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47400575/

相关文章:

PHP - 过滤输入的编码问题

php - Telegram 通过 bot api 添加成员到 channel

java - 从java字符串替换脚本标签

java - 将元素添加到文档并使用 XPath 查找

php - 将 "Image"标签替换为 "a"标签 PHP DOMDocument

php - 在 xml 文件的特定点插入 xml

PHP 面向对象 - 真实案例

php - 在 PHP 中使用 mysql 查询返回字符串列表

java - 用于检查在 Java 中不起作用的允许字符的正则表达式

Java - 按数字和字母分割字符串