xml - 使用 block 中的 xmlstarlet 从同一节点获取多个子注释

标签 xml linux xpath xmlstarlet

我有一个 XML 文件,其中包含多个用户条目以及一些用户数据,例如姓名、电子邮件和其他数据。看来这可以使用多个 --value-of 来完成( -v ) 参数如下:

$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t --nl -v "//n:title" -v "//n:email" ~/tests/test-xml.xml

Some user
Some user #2
Some user #<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87b4f4e8eae2a9f2f4e2f5c7e2ffe6eaf7ebe2a9e4e8ea" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3e4d51535b104b4d5b4c0c7e5b465f534e525b105d5153" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7605191b1358030513044536130e171b061a135815191b" rel="noreferrer noopener nofollow">[email protected]</a>

但它们并不在一起,看起来该工具处理了所有 <title>首先是元素,然后是所有 <email>那些。我喜欢以下格式:

Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="14677b79713a6167716654716c75796478713a777b79" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6c1f03010942191f091e5e2c09140d011c0009420f0301" rel="noreferrer noopener nofollow">[email protected]</a>
...

发现我需要xpath函数concat为了这。现在我至少用逗号分隔它们:

$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "concat(current()//n:title, ',', current()//n:email)" ~/tests/test-xml.xml
Some user,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e064d1610061123061b020e130f064d000c0e300c0e06" rel="noreferrer noopener nofollow">[email protected]</a> user #2,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d6e72707833686e786f2f5d78657c706d7178337e72704e727078" rel="noreferrer noopener nofollow">[email protected]</a> user #3,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bbc8d4d6de95cec8dec988fbdec3dad6cbd7de95d8d4d6" rel="noreferrer noopener nofollow">[email protected]</a>

这正是我所需要的,但是当我设置 \n 时作为分隔符而不是 , ,它只会打印 \n而不是换行。 \\n 也会发生同样的情况和\r\n 。作为解决方法,可以使用 sed 来替换它,如下所示: sed 's/,/\n/g'

但是,这并不能解决 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="13607c7e763d6660766153766b727e637f763d707c7e407c7e76" rel="noreferrer noopener nofollow">[email protected]</a> user #2 之间没有换行的问题:

$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "concat(current()//n:title, ',', current()//n:email)" ~/tests/test-xml.xml | sed 's/,/\n/g'
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c5b6aaa8a0ebb0b6a0b785a0bda4a8b5a9a0eba6aaa896aaa8a0" rel="noreferrer noopener nofollow">[email protected]</a> user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b1c2dedcd49fc4c2d4c383f1d4c9d0dcc1ddd49fd2dedce2dedcd4" rel="noreferrer noopener nofollow">[email protected]</a> user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="62110d0f074c171107105122071a030f120e074c010d0f" rel="noreferrer noopener nofollow">[email protected]</a>

我怎样才能意识到这一点?更喜欢没有额外的解决方案 sed命令,如果它有意义并且可能的话。

解决方法

我发现的唯一解决方法是将其嵌套在另一个 concat 中调用添加另一个字符,该字符标识需要另一个新行的位置,并且可以替换为 \n太像这样了:

$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "concat(concat(current()//n:title, ',', current()//n:email), '|', '')" ~/tests/test-xml.xml | sed -E 's/[,|]+/\n/g'
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7d0e12101853080e180f3d18051c100d1118531e1210" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6615090b0348131503145426031e070b160a034805090b" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2c5f43414902595f495e1f6c49544d415c4049024f4341" rel="noreferrer noopener nofollow">[email protected]</a>
尽管这可行,但对我来说这似乎是一个令人讨厌的解决方法。想知道是否有更干净的方法来做到这一点。我想有可能更深入地体验xmlstarlet也许还有xpath .

测试 XML 文档

<?xml version="1.0" encoding="UTF-8"?>
<feed
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:app="http://www.w3.org/2007/app"
  xmlns:snx="http://www.ibm.com/xmlns/prod/sn"
  xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <opensearch:totalResults
    xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">67
  </opensearch:totalResults>
  <opensearch:startIndex
    xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">1
  </opensearch:startIndex>
  <opensearch:itemsPerPage
    xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">100
  </opensearch:itemsPerPage>

  <entry>
    <title>Some user</title>
    <contributor>
      <email><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="53203c3e367d2620362113362b323e233f367d303c3e" rel="noreferrer noopener nofollow">[email protected]</a></email>
    </contributor>
  </entry>

  <entry>
    <title>Some user #2</title>
    <contributor>
      <email><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b8cbd7d5dd96cdcbddca8af8ddc0d9d5c8d4dd96dbd7d5" rel="noreferrer noopener nofollow">[email protected]</a></email>
    </contributor>
  </entry>

  <entry>
    <title>Some user #3</title>
    <contributor>
      <email><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="52213d3f377c272137206112372a333f223e377c313d3f" rel="noreferrer noopener nofollow">[email protected]</a></email>
    </contributor>
  </entry>

</feed>

最佳答案

最简单的方法是在每个条目后输出换行符(--nl):

xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "n:title" --nl -v "n:contributor/n:email" --nl input.xml

但这会在输出末尾输出一个额外的换行符:

Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7704181a12590204120537120f161a071b125914181a" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1a6975777f346f697f68285a7f627b776a767f34797577" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="54273b39317a212731266714312c35392438317a373b39" rel="noreferrer noopener nofollow">[email protected]</a>

另一种方法是在条目之前输出换行符(如果它不是第一个条目)。 (使用 -i (xsl:if) 和 -b (中断嵌套))...

xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -i "position() > 1" --nl -b -v "n:title" --nl -v "n:contributor/n:email" input.xml

输出:

Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e064d1610061123061b020e130f064d000c0e" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="02716d6f672c777167703042677a636f726e672c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="780b17151d560d0b1d0a4b381d00191508141d561b1715" rel="noreferrer noopener nofollow">[email protected]</a>

关于xml - 使用 block 中的 xmlstarlet 从同一节点获取多个子注释,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64594096/

相关文章:

xml - 如何使用 XSLT 进行通配符匹配?

linux - 仅从文本文件中删除完全形成的行范围,而忽略那些只有开始定界符的行

WCF 消息记录 - 使用 XPath 查询添加过滤器

html - 如何在匹配组之间使用正则表达式从 html youtube 页面获取文本

xml - odoo/openERP中子节点的Qweb Xpath?

java - 如何在 Android 中为 mp3 文件实现弹出式媒体播放器?

java - 无法提取响应 : no suitable HttpMessageConverter with jaxb2marshaller

linux - 在 ubuntu 上升级 qt 版本以使用 QopenGLWidget

linux - 使用socks代理进行ssh

html - Xpath用于2个不同标签中的两个不同属性