xml - 如何使用 XQuery 以 CSV 格式提取多个 xml 元素？

我正在尝试使用字符串连接函数从 XML 文件中提取多个元素，该函数对于单个元素效果很好。但是，当我尝试将另一个数据添加到我的代码中时，我看到的数据不正确。我怀疑我在某个地方遗漏了一个简单的东西，但似乎无法找到它..

示例 XML 数据:-

<books>
  <book id="6636551">
    <master_information>
      <book_xref>
        <xref type="Fiction" type_id="1">72771KAM3</xref>
        <xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
      </book_xref>
    </master_information>
    <book_details>
      <price>24.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
    </book_details>
    <global_information>
      <ratings>
        <rating agency="ABC Agency" type="Author Rating">A++</rating>
        <rating agency="DEF Agency" type="Author Rating">A+</rating>
        <rating agency="DEF Agency" type="Book Rating">A</rating>
      </ratings>
    </global_information>
    <country_info>
      <country_code>US</country_code>
    </country_info>
  </book>
  <book id="119818569">
    <master_information>
      <book_xref>
        <xref type="Fiction" type_id="1">070185UL5</xref>
        <xref type="Non_Fiction" type_id="2">US070185UL50</xref>
      </book_xref>
    </master_information>
    <book_details>
      <price>19.25</price>
      <publish_date>2002-11-01</publish_date>
      <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
    </book_details>
    <global_information>
      <ratings>
        <rating agency="ABC Agency" type="Author Rating">A+</rating>
        <rating agency="ABC Agency" type="Book Rating">A</rating>
        <rating agency="DEF Agency" type="Author Rating">A</rating>
        <rating agency="DEF Agency" type="Book Rating">B+</rating>
      </ratings>
    </global_information>
    <country_info>
      <country_code>CA</country_code>
    </country_info>
  </book>
  </book>
</books>

用于拉取单个元素的 XQuery:-

for $x in string-join(('book_id,book_price', //book/book_details/price/string-join((ancestor::book/@id, .), ',')), '&#10;')
return $x

效果很好，并且输出示例输出如下:

book_id,book_price
6636551,24.95
119818569,19.25

问题是如何从单个 XML 文件中提取多个元素或元素和属性的组合(可能仍使用字符串连接)？

我尝试使用以下内容(在大多数情况下都可以正常工作)，但我注意到对于较大的数据集，值似乎随机地填充在错误的列中。例如。在下面的代码中，如果数据中的 ./publish_date 为空白，我注意到 ./description 数据将填充在 ./publish_date 列中。

for $x in string-join(('book_id,book_price,book_pub_date,book_desc', //book/book_details/string-join((ancestor::book/@id, ./price, ./publish_date, ./description), ',')), '&#10;')
return $x

仅供引用，正如您所知，我仍在学习 XQuery。我感谢您的见解/意见/帮助!

最佳答案

XQuery 中的序列被展平:表达式 (1, (2, 3), ((4)), (), 5) 和 ( 1, 2, 3, 4, 5) 是等效的。这意味着，如果某些 XPath 子查询没有返回结果，则序列 (ancestor::book/@id, ./price, ./publish_date, ./description) 的长度会有所不同。由于函数 fn:string-join($strings, $sep) 只是将 $strings 中每对相邻项之间的分隔符(展平)，因此生成的字符串可以其中有不同数量的逗号。

为了保持 CSV 表的对齐，您可以在值缺失时插入空字符串。一个简单的方法是利用扁平化的优势:($possibility-empty, '')[1]

如果 $possibility-empty 包含一个项目(例如 'foo')，则其计算结果为 ('foo', '')[1] -> 'foo'.
如果是空序列 ()，则表达式的计算结果为 ((), '')[1] -> ('') [1](扁平化)->''。

工作示例(您封闭的 FLWOR 表达式 (for/return) 是完全多余的，因为您只迭代单个字符串元素，所以我省略它):

string-join(
  (
    'book_id,book_price,book_pub_date,book_desc',
    //book/book_details/string-join(
      (
        (ancestor::book/@id, '')[1],
        (./price, '')[1],
        (./publish_date, '')[1],
        (./description, '')[1]
      ),
      ','
    )
  ),
  '&#10;'
)

您还可以将该功能抽象为自己的函数:

declare function local:non-empty($possibly-empty) {
  ($possibly-empty, '')[1]
};

string-join(
  (
    'book_id,book_price,book_pub_date,book_desc',
    //book/book_details/string-join(
      (
        local:non-empty(ancestor::book/@id),
        local:non-empty(./price),
        local:non-empty(./publish_date),
        local:non-empty(./description)
      ),
      ','
    )
  ),
  '&#10;'
)

关于xml - 如何使用 XQuery 以 CSV 格式提取多个 xml 元素？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49756285/

xml - 如何使用 XQuery 以 CSV 格式提取多个 xml 元素？

上一篇：xml - 为什么此xpath与该文本节点匹配？

下一篇：xpath - XPath返回XHTML文件中的第一个<a> </a>标记