我有一个查询,它提取被“喜欢”超过 5 次的帖子。
//div[@class="pin"]
[.//span[@class = "LikesCount"]
[substring-before(normalize-space(text())," ") > 5]
我想提取并存储其他信息,例如标题、img url、编号、repin 编号等...
如何将它们全部提取出来?
- 多个 XPath 查询?
- 在使用 php 和 php 函数迭代时深入研究结果帖子的节点?
- ...
遵循标记示例:
<div class="pin">
<p class="description">gorgeous couch <a href="#">#modern</a></p>
[...]
<div class="PinHolder">
<a href="/pin/56787645270909880/" class="PinImage ImgLink">
<img src="http://media-cache-ec3.pinterest.com/upload/56787645270909880_d7AaHYHA_b.jpg"
alt="Krizia"
data-componenttype="MODAL_PIN"
class="PinImageImg"
style="height: 288px;">
</a>
</div>
<p class="stats colorless">
<span class="LikesCount">
22 likes
</span>
<span class="RepinsCount">
6 repins
</span>
</p>
[...]
</div>
最佳答案
由于您已经在代码中使用了 XPath,因此我建议您也使用 XPath 来提取该信息。这里有一个关于如何提取描述的示例。
<?php
// will store the posts as assoc arrays
$mostLikedPostsArr = array();
// call your fictional load function
$doc = load_html('whatever');
// create a XPath selector
$selector = new DOMXPath($doc);
// this your query from above
$query = '//div[@class="pin"][.//span[@class = "LikesCount"][substring-before(normalize-space(text())," ") > 5]';
// getting the most liked posts
$mostLikedPosts = $selector->query($query);
// now iterate through the post nodes
foreach($mostLikedPosts as $post) {
// assoc array for a post
$postArr = array();
// you can do 'relative' queries once having a reference to $post
// note $post as the second parameter to $selector->query()
// lets extract the description for example
$result = $selector->query('p[@class = "description"]', $post);
// just using nodeValue might be ok for text only nodes.
// to properly flatten the <a> tags inside the descriptions
// it will take further attention.
$postArr['description'] = $result->item(0)->nodeValue;
// ...
$mostLikedPostsArr []= $postArr;
}
关于php - 通过 XPath 提取 HTML 字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13915614/