我有一个 XML 文件,其中包含多个用户条目以及一些用户数据,例如姓名、电子邮件和其他数据。看来这可以使用多个 --value-of
来完成( -v
) 参数如下:
$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t --nl -v "//n:title" -v "//n:email" ~/tests/test-xml.xml
Some user
Some user #2
Some user #<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87b4f4e8eae2a9f2f4e2f5c7e2ffe6eaf7ebe2a9e4e8ea" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3e4d51535b104b4d5b4c0c7e5b465f534e525b105d5153" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7605191b1358030513044536130e171b061a135815191b" rel="noreferrer noopener nofollow">[email protected]</a>
但它们并不在一起,看起来该工具处理了所有 <title>
首先是元素,然后是所有 <email>
那些。我喜欢以下格式:
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="14677b79713a6167716654716c75796478713a777b79" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6c1f03010942191f091e5e2c09140d011c0009420f0301" rel="noreferrer noopener nofollow">[email protected]</a>
...
发现我需要xpath函数concat
为了这。现在我至少用逗号分隔它们:
$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "concat(current()//n:title, ',', current()//n:email)" ~/tests/test-xml.xml
Some user,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e064d1610061123061b020e130f064d000c0e300c0e06" rel="noreferrer noopener nofollow">[email protected]</a> user #2,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d6e72707833686e786f2f5d78657c706d7178337e72704e727078" rel="noreferrer noopener nofollow">[email protected]</a> user #3,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bbc8d4d6de95cec8dec988fbdec3dad6cbd7de95d8d4d6" rel="noreferrer noopener nofollow">[email protected]</a>
这正是我所需要的,但是当我设置 \n
时作为分隔符而不是 ,
,它只会打印 \n
而不是换行。 \\n
也会发生同样的情况和\r\n
。作为解决方法,可以使用 sed 来替换它,如下所示: sed 's/,/\n/g'
但是,这并不能解决 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="13607c7e763d6660766153766b727e637f763d707c7e407c7e76" rel="noreferrer noopener nofollow">[email protected]</a> user #2
之间没有换行的问题:
$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "concat(current()//n:title, ',', current()//n:email)" ~/tests/test-xml.xml | sed 's/,/\n/g'
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c5b6aaa8a0ebb0b6a0b785a0bda4a8b5a9a0eba6aaa896aaa8a0" rel="noreferrer noopener nofollow">[email protected]</a> user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b1c2dedcd49fc4c2d4c383f1d4c9d0dcc1ddd49fd2dedce2dedcd4" rel="noreferrer noopener nofollow">[email protected]</a> user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="62110d0f074c171107105122071a030f120e074c010d0f" rel="noreferrer noopener nofollow">[email protected]</a>
我怎样才能意识到这一点?更喜欢没有额外的解决方案 sed
命令,如果它有意义并且可能的话。
解决方法
我发现的唯一解决方法是将其嵌套在另一个 concat
中调用添加另一个字符,该字符标识需要另一个新行的位置,并且可以替换为 \n
太像这样了:
$ xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "concat(concat(current()//n:title, ',', current()//n:email), '|', '')" ~/tests/test-xml.xml | sed -E 's/[,|]+/\n/g'
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7d0e12101853080e180f3d18051c100d1118531e1210" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6615090b0348131503145426031e070b160a034805090b" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2c5f43414902595f495e1f6c49544d415c4049024f4341" rel="noreferrer noopener nofollow">[email protected]</a>
尽管这可行,但对我来说这似乎是一个令人讨厌的解决方法。想知道是否有更干净的方法来做到这一点。我想有可能更深入地体验xmlstarlet
也许还有xpath
.
测试 XML 文档
<?xml version="1.0" encoding="UTF-8"?>
<feed
xmlns="http://www.w3.org/2005/Atom"
xmlns:app="http://www.w3.org/2007/app"
xmlns:snx="http://www.ibm.com/xmlns/prod/sn"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<opensearch:totalResults
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">67
</opensearch:totalResults>
<opensearch:startIndex
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">1
</opensearch:startIndex>
<opensearch:itemsPerPage
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">100
</opensearch:itemsPerPage>
<entry>
<title>Some user</title>
<contributor>
<email><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="53203c3e367d2620362113362b323e233f367d303c3e" rel="noreferrer noopener nofollow">[email protected]</a></email>
</contributor>
</entry>
<entry>
<title>Some user #2</title>
<contributor>
<email><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b8cbd7d5dd96cdcbddca8af8ddc0d9d5c8d4dd96dbd7d5" rel="noreferrer noopener nofollow">[email protected]</a></email>
</contributor>
</entry>
<entry>
<title>Some user #3</title>
<contributor>
<email><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="52213d3f377c272137206112372a333f223e377c313d3f" rel="noreferrer noopener nofollow">[email protected]</a></email>
</contributor>
</entry>
</feed>
最佳答案
最简单的方法是在每个条目
后输出换行符(--nl
):
xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -v "n:title" --nl -v "n:contributor/n:email" --nl input.xml
但这会在输出末尾输出一个额外的换行符:
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7704181a12590204120537120f161a071b125914181a" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1a6975777f346f697f68285a7f627b776a767f34797577" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="54273b39317a212731266714312c35392438317a373b39" rel="noreferrer noopener nofollow">[email protected]</a>
另一种方法是在条目
之前输出换行符(如果它不是第一个条目)。 (使用 -i (xsl:if) 和 -b (中断嵌套))...
xmlstarlet sel -N n="http://www.w3.org/2005/Atom" -t -m "//n:entry" -i "position() > 1" --nl -b -v "n:title" --nl -v "n:contributor/n:email" input.xml
输出:
Some user
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="63100c0e064d1610061123061b020e130f064d000c0e" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #2
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="02716d6f672c777167703042677a636f726e672c616d6f" rel="noreferrer noopener nofollow">[email protected]</a>
Some user #3
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="780b17151d560d0b1d0a4b381d00191508141d561b1715" rel="noreferrer noopener nofollow">[email protected]</a>
关于xml - 使用 block 中的 xmlstarlet 从同一节点获取多个子注释,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64594096/