css - 从循环中获取 HTML 属性

我有一个元素列表:

<div class="item">
    <a href="//external-link.com">
        <img src="main-image.jpg" alt=""/>
    </a>
    <h2> Title </h2>
    <p> Description lorem here </p>
</div>
<div class="item">
    <a href="//external-link.com">
        <img src="main-image.jpg" alt=""/>
    </a>
    <h2> Title </h2>
    <p> Description lorem here </p>
</div>
<div class="item">
    <a href="//external-link.com">
        <img src="main-image.jpg" alt=""/>
    </a>
    <h2> Title </h2>
    <p> Description lorem here </p>
</div>

我想提取 <h2> 的文本标记，以及 <a> 的“src”和“href”和 <img>标签，但我不知道如何提取“src”和“href”属性。

这就像我正在使用的:

require 'nokogiri'
require 'open-uri'

pageURL = 'http://ticketdriver.com/amg/buy/tickets'
page = Nokogiri::HTML(open(pageURL), nil, 'UTF-8')

page.css('.item').each do |node|
    title = node.css('h2').text
    srcUrl = node.css('img')['src']
end

text部分工作正常，但我无法访问“.item”的子元素的键和值。我试过 children[0] , [0]['src'] , [:src] , attr() , attribute()还有一些。

我完全没有想法和 Google 搜索页面。

最佳答案

我会做类似的事情:

doc = Nokogiri::HTML(<<EOT)
<html><body>
    <div class="item">
        <a href="//external-link.com">
            <img src="main-image1.jpg" alt=""/>
        </a>
        <h2> Title1 </h2>
    </div>
    <div class="item">
        <a href="//external-link.com">
            <img src="main-image2.jpg" alt=""/>
        </a>
        <h2> Title2 </h2>
    </div>
    <div class="item">
        <a href="//external-link.com">
            <img src="main-image3.jpg" alt=""/>
        </a>
        <h2> Title3 </h2>
    </div>
</body></html>
EOT

items = doc.search('.item').map { |item|
  {
    title: item.at('h2').text,
    src: item.at('img')['src']
  }
}

结果是:

items
# => [{:title=>" Title1 ", :src=>"main-image1.jpg"},
#     {:title=>" Title2 ", :src=>"main-image2.jpg"},
#     {:title=>" Title3 ", :src=>"main-image3.jpg"}]

我故意只从 <img> 中获取“src”属性标签。鉴于上面的代码，你可以弄清楚如何从 <a> 中得到你想要的东西。标签。

请注意，我使用的是通用 search 而不是 css . Nokogiri 足够聪明，可以在大多数时候区分 CSS 和 XPath 选择器。我唯一一次使用 css 或 xpath 是Nokogiri想不通的时候。我使用 CSS 是因为它通常更简单且更易于阅读。

另外，请注意我没有使用 node.css('h2').text . css 返回一个类似于数组的 NodeSet，而 at 返回单个节点。在您的代码中，您掩盖了两者之间的差异，但使用 css , xpath或通用 search 是等待中的错误。考虑一下:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html><body>
  <p>foo</p>
  <p>bar</p>
  <p>baz</p>
</body></html>
EOT

doc.search('p').text # => "foobarbaz"
doc.at('p').text # => "foo"

这意味着，如果 search 或其特定方法之一返回一个 NodeSet，text将返回该集合中所有节点的文本，这很少是您想要的。相反，您需要使用 at找到您想要的特定子节点，然后提取其文本。如何做到这一点是另一个问题，但很容易做到。

关于css - 从循环中获取 HTML 属性，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33087945/

css - 从循环中获取 HTML 属性

上一篇：html - 在 div 上使用固定位置时出现意外行为

下一篇：html - 文本对齐 : center not working with horizontal dojox/mobile/ScrollablePane