php - 如何使用 PHP 解析带有冒号标记的 XML 节点

我正在尝试从 [此 URL(加载需要相当长的时间)][1] 获取以下节点的值。我感兴趣的元素是:

title, g:price and g:gtin

XML 的开头是这样的:

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
  <channel>
    <title>PhotoSpecialist.de</title>
    <link>http://www.photospecialist.de</link>
    <description/>
    <item>
      <g:id>BEN107C</g:id>
      <title>Benbo Trekker Mk3 + Kugelkopf + Tasche</title>
      <description>
        Benbo Trekker Mk3 + Kugelkopf + Tasche Das Benbo Trekker Mk3 ist eine leichte Variante des beliebten Benbo 1. Sein geringes Gewicht macht das Trekker Mk3 zum idealen Stativ, wenn Sie viel draußen fotografieren und viel unterwegs sind. Sollten Sie in eine Situation kommen, in der maximale Stabilität zählt, verfügt das Benbo Trekker Mk3 über einen Haken an der Mittelsäule. An diesem können Sie das Stativ mit zusätzlichem Gewicht bei Bedarf beschweren. Dank der zwei besonderen Kamera-Befestigungsschrauben können Sie mit dem Benbo Trekker Mk3 sehr nah am Boden fotografieren. So nah, dass in vielen Fällen die einzige Einschränkung die Größe Ihrer Kamera darstellt. In diesem Set erhalten Sie das Benbo Trekker Mk3 zusammen mit einem Kugelkopf, Socket und einer Tasche für den sicheren und komfortablen Transport.
      </description>
      <link>
        http://www.photospecialist.de/benbo-trekker-mk3-kugelkopf-tasche?dfw_tracker=2469-16
      </link>
      <g:image_link>http://static.fotokonijnenberg.nl/media/catalog/product/b/e/benbo_trekker_mk3_tripod_kit_with_b__s_head__bag_ben107c1.jpg</g:image_link>
      <g:price>199.00 EUR</g:price>
      <g:condition>new</g:condition>
      <g:availability>in stock</g:availability>
      <g:identifier_exists>TRUE</g:identifier_exists>
      <g:brand>Benbo</g:brand>
      <g:gtin>5022361100576</g:gtin>
      <g:item_group_id>0</g:item_group_id>
      <g:product_type>Tripod</g:product_type>
      <g:mpn/>
      <g:google_product_category>Kameras & Optik</g:google_product_category>
    </item>
  ...
  </channel>
</rss>

为此，我编写了以下代码:

$z = new XMLReader;
$z->open('https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml');

$doc = new DOMDocument;

while ($z->read() && $z->name !== 'item')
    ;

while ($z->name === 'item')
{
    $node = new SimpleXMLElement($z->readOuterXML());
    $a = $node->title;
    $b = $node->price;
    $c = $node->gtin;
    echo $a . $b . $c . "<br />";
    $z->next('item');
}

这只返回标题...价格和 gtin 没有显示。

最佳答案

您询问的元素不是默认 namespace 的一部分，而是在另一个 namespace 中。你可以看到，因为他们的名字中有一个前缀，用冒号分隔:

  ...
  <channel>
    <title>PhotoSpecialist.de</title>
    <!-- title is in the default namespace, no colon in the name -->
    ...
    <g:price>199.00 EUR</g:price>
    ...
    <g:gtin>5022361100576</g:gtin>
    <!-- price and gtin are in a different namespace, colon in the name and prefixed by "g" -->
  ...

命名空间带有前缀，在您的例子中为“g”。命名空间代表的前缀在此处的文档元素中定义:

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">

所以命名空间是“http://base.google.com/ns/1.0”。

当您像现在一样使用 SimpleXMLElement 按名称访问子元素时:

$a = $node->title;
$b = $node->price;
$c = $node->gtin;

您只在默认命名空间中查找。因此只有第一个元素实际包含文本，另外两个元素是即时创建的，但还是空的。

要访问命名空间的子元素，您需要使用 children() 方法显式地告诉 SimpleXMLElement。它创建一个新的 SimpleXMLElement，其中包含该命名空间中的所有子项，而不是默认的子项:

$google = $node->children("http://base.google.com/ns/1.0");

$a = $node->title;
$b = $google->price;
$c = $google->gtin;

孤立的例子就这么多了(是的，已经是这样了)。

一个完整的例子看起来像(包括阅读器上的节点扩展，你的代码有点生疏):

<?php
/**
 * How to parse an XML node with a colon tag using PHP
 *
 * @link http://stackoverflow.com/q/29876898/367456
 */
const HTTP_BASE_GOOGLE_COM_NS_1_0 = "http://base.google.com/ns/1.0";

$url = 'https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml';

$reader = new XMLReader;
$reader->open($url);

$doc = new DOMDocument;

// move to first item element
while (($valid = $reader->read()) && $reader->name !== 'item') ;

while ($valid) {
    $default    = simplexml_import_dom($reader->expand($doc));
    $googleBase = $default->children(HTTP_BASE_GOOGLE_COM_NS_1_0);
    printf(
        "%s - %s - %s<br />\n"
        , htmlspecialchars($default->title)
        , htmlspecialchars($googleBase->price)
        , htmlspecialchars($googleBase->gtin)
    );

    // move to next item element
    $valid = $reader->next('item');
};

我希望这既能给出解释，又能拓宽对 XMLReader 使用的看法。

关于php - 如何使用 PHP 解析带有冒号标记的 XML 节点，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29876898/

php - 如何使用 PHP 解析带有冒号标记的 XML 节点

上一篇：java - 无法使用自定义适配器单击 listview android

下一篇：python - 如何使用 pysimplesoap 构造 SOAP 消息？