ruby - 如何使用 Mechanize 和 nokogiri ruby 获取链接

鉴于下面的示例，谁能告诉我如何使用 Nokogiri 和 Mechanize 来获取每个 <h4> 下的所有链接不同组中的标签，即以下所有链接:

“一些文字”

“还有一些文字”

“一些额外的文字”

<div id="right_holder">
    <h3><a href="#"><img src="http://example.com" width="11" height="11"></a></h3>
    <br />
    <br />
    <h4><a href="#">Some text</a></h4>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <br />
    <br />
    <h4><a href="#">Some more text</a></h4>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <br />
    <br />
    <h4><a href="#">Some additional text</a></h4>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
    <a href="#" alt="name of item"><img src="http://some.image.com" class="class1"></a>
</div>

最佳答案

一般来说，你会这样做:

page.search('h4 a').each do |a|
  puts a[:href]
end

但是我相信您已经注意到这些链接实际上没有一个可以去任何地方。

更新:

将它们分组如何一些节点集数学:

page.search('h4').each do |h4|
  puts h4.text
  (h4.search('~ a') - h4.search('~ h4 ~ a')).each do |a|
    puts a.text
  end
end

这意味着每个 a跟随 h4并且不关注另一个 h4

关于ruby - 如何使用 Mechanize 和 nokogiri ruby 获取链接，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29706799/

ruby - 如何使用 Mechanize 和 nokogiri ruby 获取链接

上一篇：ruby - 映射两个 Nokogiri 对象

下一篇：ruby-on-rails - 如何在 Rake 任务中在后台运行函数？

ruby - 如何使用 Mechanize 和 nokogiri ruby​​ 获取链接

上一篇：ruby - 映射两个 Nokogiri 对象

下一篇：ruby-on-rails - 如何在 Rake 任务中在后台运行函数？

ruby - 如何使用 Mechanize 和 nokogiri ruby 获取链接