ruby - Nokogiri:如何使用命名空间前缀获取节点名称

标签 ruby xml rspec namespaces nokogiri

我尝试(出于测试目的)解析 Google 商家 XML 提要,定义为:

    <?xml version="1.0" encoding="UTF-8"?>
     <feed xml:lang="cs" xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
      <link rel="alternate" type="text/html" href="http://www.example.com"/>
      <link rel="self" type="application/atom+xml" href="http://www.example.com/cs/feed/google.xml"/>
      <title>EasyOptic</title>
      <updated>2014-08-01T16:31:11Z</updated>
      <entry>
        <title>Sluneční Brýle Producer 1  133a code_color_1 Color 1 133a RayBan</title>
        <link href="http://www.example.com/cs/katalog/price-category-1-style-1-optical-glasses-producer-1-rayban-133a-code_color_1-color-1"/>
        <summary>Moc krásný a velmi levný produkt</summary>
        <updated>2014-08-01T16:31:11Z</updated>
        <g:id>EO111</g:id>
        <g:condition>new</g:condition>
        <g:price>100 Kč</g:price>
        <g:availability>in stock</g:availability>
        <g:image_link>http://www.example.com/images/fallback/default.png</g:image_link>
        <g:additional_image_link>http://www.example.com/images/fallback/default.png</g:additional_image_link>
        <g:brand>Producer 1</g:brand>
        <g:mpn>EO111</g:mpn>
        <g:gender>female</g:gender>
        <g:google_product_category>Apparel &amp; Accessories &gt; Clothing Accessories &gt; Sunglasses</g:google_product_category>
        <g:product_type>Sluneční Brýle </g:product_type>
      </entry>
      <entry>
        <title>Sluneční Brýle Producer 1  133a code_color_1 Color 1 133a RayBan</title>
        <link href="http://www.example.com/cs/katalog/price-category-1-style-1-optical-glasses-producer-1-rayban-133a-code_color_1-color-1"/>
        <summary>Moc krásný a velmi levný produkt</summary>
        <updated>2014-08-01T16:31:10Z</updated>
        <g:id>EO111</g:id>
        <g:condition>new</g:condition>
        <g:price>100 Kč</g:price>
        <g:availability>in stock</g:availability>
        <g:image_link>http://www.example.com/images/fallback/default.png</g:image_link>
        <g:additional_image_link>http://www.example.com/images/fallback/default.png</g:additional_image_link>
        <g:brand>Producer 1</g:brand>
        <g:mpn>EO111</g:mpn>
        <g:gender>female</g:gender>
        <g:google_product_category>Apparel &amp; Accessories &gt; Clothing Accessories &gt; Sunglasses</g:google_product_category>
        <g:product_type>Sluneční Brýle </g:product_type>
      </entry>
    </feed>

使用这个 ruby​​ 脚本:
     require 'nokogiri'

     def have_node_with_children(body, path_type, path, children_names)
        doc = Nokogiri::XML(body) 

        case path_type
          when :xpath
            nodes = doc.xpath(path)
          when :css
            nodes = doc.css(path)
          else
            nodes = doc.xpath(path)
        end

        nodes.each do |node|
          nchildren_names=[]
          for child in node.children
            nchildren_names << child.name unless child.to_s.strip =="" #nokogiri takes formating spaces as blank node with name "text"
          end

          puts("demanded_nodes: #{children_names.sort.join(", ")} , nodes found: #{nchildren_names.sort.join(", ")} ")

          missing = children_names - nchildren_names
          over = nchildren_names - children_names

          puts("Missing: #{missing.sort.join(", ")} , Over: #{over.sort.join(", ")} ")
        end
     end

      EXPECTED_ENTRY_NODES=[
        'title',
        'link',
        'summary',
        'updated',
        'g:id',
        'g:condition',
        'g:price',
        'g:availability',
        'g:image_link',
        'g:additional_image_link',
        'g:brand',
        'g:mpn',
        'g:gender',
        'g:google_product_category',
        'g:product_type'
        ]


     file=File.open('google.xml')
     have_node_with_children(file.read,:xpath,'//xmlns:entry',EXPECTED_ENTRY_NODES)

它找到节点“条目”(感谢 this tip)。
但是在收集它的 child 方法时child.name返回没有命名空间前缀的名称(例如: <'g:brand'>.name => 'brand'
因此,与所需字段的比较失败。
有没有人想用/和它的命名空间前缀来获取节点名称?

如果我删除命名空间定义一切正常,但我无法更改原始 XML。
我在 rspec 请求测试中使用了这个测试,所以可能会出现另一个具有相同基本节点名称的命名空间。

最佳答案

xml_doc = Nokogiri::XML(xml)

xml_doc.xpath("//xmlns:entry").each do |entry|
  entry.xpath("./*").each do |element| #Step through all Element nodes that are direct children of <entry>
    prefix = element.namespace.prefix
    puts prefix ? "#{element.namespace.prefix}:#{element.name}" 
                : element.name
  end

  break #only show output for the first <entry>
end

--output:--
title
link
summary
updated
g:id
g:condition
g:price
g:availability
g:image_link
g:additional_image_link
g:brand
g:mpn
g:gender
g:google_product_category
g:product_type

现在关于这个:
for child in node.children

一个有基础的 ruby​​ist 不会使用 for 循环……因为 for_loop 只是调用 each(),所以 ruby​​ists 直接调用 each():
node.children.each do |child|

关于ruby - Nokogiri:如何使用命名空间前缀获取节点名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25122219/

相关文章:

json - 为 HTTP 响应在 JSON 中嵌入 XML 的最佳实践?

java - 如何通过castor生成的对象在XML中按属性值查找元素

ruby-on-rails - 测试是否缺少输入标签的值属性

ruby-on-rails - RSpec 请求 - 如何为所有请求设置 http 授权 header

ruby-on-rails - DHH 单元测试 : Is RSpec indeed needlessly complicated?

ruby - 如何在 XPath text() 函数中匹配 br 标签

javascript - 将 Tailwind 添加到 Solidus 商店

ruby-on-rails - 将变量传递到 block 中 - Rails

创建时出现 ruby​​ ssl soap 错误

android - 在 XML 中设置微调器模式