php - 用于在 PHP 中解析 HTML 的 CSS 选择器

以前用jsoup在Java中解析html。它几乎可以选择和解析所有内容。我最近切换到 PHP，并尝试了几个 DOM 解析器，但 css 选择器没有按预期工作(或者，和 jsoup 一样好)。例如，我尝试选择 Google 的关于(在左上角)链接。的主页使用:

1。 DOMCrawler - Symfony:

$crawler->filter('#hptl > a:nth-child(1)')->each(function ($node) {
    print $node->text()."\n";
});
Result: Empty Page

2。简单的 HTML DOM:

require "simple_html_dom.php";

// Create DOM from URL or file
$html = file_get_html("https://google.com");

// Find innertext of about
foreach($html->find("#hptl > a:nth-child(1)") as $element) {
    echo $element->innertext . "<br>";
}
Result: Empty Page

3。查询:

$doc = phpQuery::newDocumentFile('https://google.com');
dd($doc->find("#hptl > a:nth-child(1)")->text());
Result: Empty String

但是如果我尝试使用 jsoup 选择元素，jsoup的css选择器可以很方便的选择元素。

我已经用不同的选择器进行了测试，在大多数情况下，它们无法选择我想要的元素，但 jsoup 没有。以下是此类选择器的示例:

div.schedule_table:nth-child(8) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(2) > p:nth-child(1)

我通常从开发工具中复制 css 选择器。我在这个过程中做错了什么吗？如果没有，是否有更好的解析器，对 PHP 提供完整的 css 选择器支持？

最佳答案

自 OP 发布以来，Google 着陆页似乎发生了一些变化。尽管如此，我在使用 QueryPath 进行类似查询时取得了很好的成功。 .例如:

<?php
require "vendor/autoload.php";
$qp =html5qp('https://google.com','#footer > div > div > a:nth-of-type(3)');
print_r($qp->text());

返回“关于 Google”

请注意，Google 着陆页的内容取决于 user-agent 请求 header 。如果您想匹配您在浏览器中看到的页面，您必须单独下载该页面，并使用适当的 user-agent 请求 header 。

关于php - 用于在 PHP 中解析 HTML 的 CSS 选择器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48366426/

php - 用于在 PHP 中解析 HTML 的 CSS 选择器

1。 DOMCrawler - Symfony:

2。简单的 HTML DOM:

3。查询:

上一篇：android - 从同一 fragment 中重新启动 fragment

下一篇：avro - Parquet Data 时间戳列 INT96 尚未在 Druid Overlord Hadoop 任务中实现