php - 如何在没有 HTML 包装器的情况下保存 DOMDocument 的 HTML?

标签 php serialization domdocument

我是下面的函数,我正在努力输出 DOMDocument 而不在输出之前附加 XML、HTML、bodyp 标记包装器内容。建议的修复:

$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));

仅当内容中没有 block 级元素时才有效。但是,当它这样做时,如下例中使用 h1 元素的示例,saveXML 的结果输出将被截断为...

<p>If you like</p>



function rseo_decorate_keyword($postarray) {
    global $post;
    $keyword = "Jasmine Tea"
    $content = "If you like <h1>jasmine tea</h1> you will really like it with Jasmine Tea flavors. This is the last ocurrence of the phrase jasmine tea within the content. If there are other instances of the keyword jasmine tea within the text what happens to jasmine tea."
    $d = new DOMDocument();
    $x = new DOMXpath($d);
    $count = $x->evaluate("count(//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and (ancestor::b or ancestor::strong)])");
    if ($count > 0) return $postarray;
    $nodes = $x->query("//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6) and not(ancestor::b) and not(ancestor::strong)]");
    if ($nodes && $nodes->length) {
        $node = $nodes->item(0);
        // Split just before the keyword
        $keynode = $node->splitText(strpos($node->textContent, $keyword));
        // Split after the keyword
        // Replace keyword with <b>keyword</b>
        $replacement = $d->createElement('strong', $keynode->textContent);
        $keynode->parentNode->replaceChild($replacement, $keynode);
$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->item(1));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->childNodes);
return $postarray;


所有这些答案现在都错误,因为从 PHP 5.4 和 Libxml 2.6 开始 loadHTML 现在有一个$option指示 Libxml 如何解析内容的参数。

因此,如果我们使用这些选项加载 HTML


在做 saveHTML() 时不会有doctype , 没有 <html> , 没有 <body> .

LIBXML_HTML_NOIMPLIED turns off the automatic adding of implied html/body elements LIBXML_HTML_NODEFDTD prevents a default doctype being added when one is not found.

关于 Libxml 参数的完整文档是 here

(请注意 loadHTML 文档说需要 Libxml 2.6,但 LIBXML_HTML_NODEFDTD 仅在 Libxml 2.7.8 中可用,LIBXML_HTML_NOIMPLIED 在 Libxml 2.7.7 中可用)

