XSLT 删除任意重复的同级元素

标签 xslt

答案here正在做我想要的事情,除了我不想只删除特定元素的重复同级元素,我想删除所有元素的重复同级元素。

此外,出于我的目的,“重复”元素将具有与其兄弟元素相同的属性、后代元素和文本。

如何修改该答案以实现我的目标?

这是我当前的样式表:

XSL:

<!--
    When a file is transformed using this stylesheet the output will be
    formatted as follows:

    1.)  Elements named "info" will be removed
    2.)  Duplicate sibling elements will be removed
    3.)  Attributes named "file_line_nr" or "file_name" will be removed
    4.)  Comments will be removed
    5.)  Processing instructions will be removed
    6.)  XML declaration will be removed
    7.)  Extra whitespace will be removed
    8.)  Empty attributes will be removed
    9.)  Elements which have no attributes, child elements, or text will be removed
    10.) All elements will be sorted by name recursively
    11.) All attributes will be sorted by name
-->
<xsl:stylesheet
    version="1.0"
    xmlns:xalan="http://xml.apache.org/xalan"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output indent="yes" method="xml" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <!--
        Elements/attributes to remove.  Note that comments are not elements or
        attributes.  Since there is no template to match comments they are
        automatically ignored.
    -->
    <xsl:template match="@*[normalize-space()='']|info|@file_line_nr|@file_name"/>

    <!-- Match any attribute -->
    <xsl:template match="@*">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- Match any element -->
    <xsl:template match="*">
        <xsl:variable name="elementFragment">
            <xsl:copy>
                <xsl:apply-templates select="@*">
                    <xsl:sort select="name()"/>
                </xsl:apply-templates>
                <xsl:apply-templates>
                    <xsl:sort select="name()"/>
                </xsl:apply-templates>
            </xsl:copy>
        </xsl:variable>
        <xsl:variable name="element" select="xalan:nodeset($elementFragment)/*"/>
        <xsl:if test="$element/@* or $element/* or normalize-space($element)">
            <xsl:copy-of select="$element"/>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

输入 XML:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?><!-- XML declaration should be removed -->
<z b="b" a="a" c="c">
    <?some-app inst="some instruction"?><!-- Processing instructions should be removed -->
    <a><!-- Keep elements like this because it has child elements -->
        <x c="c" b="b" a="a"/><!-- Keep elements like this because it has attributes -->
        <c>some text</c><!-- Keep elements like this because it has text -->
        <info a="a"/><!-- Elements named "info" are to be removed -->
        <w file_line_nr="42" file_name="somefile.txt"/><!-- Attributes named "file_line_nr" and "file_name" are to be removed which will leave this element empty, so it should be removed too -->
        <d/><!-- Remove elements like this because it has not attributes, no children, and no text -->

        <v a="a"><!-- Keep this element because it and it sibling "v" element are unique.. It does not have the same exact descendants as its sibling "v" element -->
            some text
            <i a="a">some text</i>
            <q a="a">some text</q>
        </v>
        <v a="a">
            some text
            <i a="a">some different text</i><!-- text is different -->
            <q a="a">some text</q>
        </v>

        <e a="a"><!-- Keep this element because it and it sibling "e" element are unique.. It does not have the same exact descendants as its sibling "e" element -->
            some text
            <j a="a">
                <p>some text</p>
            </j>
        </e>
        <e a="a">
            some text
            <j a="a">
                <p>some different text</p><!-- text is different -->
            </j>
        </e>

        <u a="a"><!-- Keep this element because it and it sibling "e" element are unique.. It does not have the same exact descendants as its sibling "e" element -->
            some text
            <k a="a">some text</k>
            <n a="a">some text</n>
        </u>
        <u a="a">
            some text
            <k b="b">some text</k><!-- attribute is different -->
            <n a="a">some text</n>
        </u>

        <f a="a"><!-- Keep this element because it and it sibling "f" element are unique.. It does not have the same exact attributes as its sibling "f" element -->
            some text
            <l a="a">some text</l>
            <m a="a">some text</m>
        </f>
        <f b="b"><!-- attribute is different -->
            some text
            <l a="a">some text</l>
            <m a="a">some text</m>
        </f>

        <t a="a"><!-- Keep this element because it and it sibling "t" element are unique. It does not have the same exact text as its sibling "t" element -->
            some text
            <az a="a">some text</az>
            <aa a="a">some text</aa>
        </t>
        <t a="a">
            some different text<!-- text is different -->
            <az a="a">some text</az>
            <aa a="a">some text</aa>
        </t>

        <g a="a"><!-- Remove this element because it is NOT unique. Its attributes, descendants, and text are exactly the same as its sibling "g" element -->
            some text
            <ay a="a">some text</ay>
            <ab a="a">some text</ab>
        </g>
        <g a="a">
            some text
            <ay a="a">some text</ay>
            <ab a="a">some text</ab>
        </g>

        <s a="a"/>
    </a>
    <y a="a"/>
    <b>
        <h a="a" />
        <r a="a"/>
    </b>
</z>

所需的输出 XML:(元素和属性已排序。注释和缩进/空白也将被删除,但我已将它们添加回此处以提高可读性。)

<z a="a" b="b" c="c">
    <a>
        <c>some text</c>
        <e a="a">
            some text
            <j a="a">
                <p>some text</p>
            </j>
        </e>
        <e a="a">
            some text
            <j a="a">
                <p>some different text</p>
            </j>
        </e>
        <f a="a">
            some text
            <l a="a">some text</l>
            <m a="a">some text</m>
        </f>
        <f b="b">
            some text
            <l a="a">some text</l>
            <m a="a">some text</m>
        </f>
        <g a="a"><!-- The sibling "g" element of this element was removed because it was an exact duplicate -->
            some text
            <ab a="a">some text</ab>
            <ay a="a">some text</ay>
        </g>
        <s a="a"/>
        <t a="a">
            some text
            <aa a="a">some text</aa>
            <az a="a">some text</az>
        </t>
        <t a="a">
            some different text
            <aa a="a">some text</aa>
            <az a="a">some text</az>
        </t>
        <u a="a">
            some text
            <k a="a">some text</k>
            <n a="a">some text</n>
        </u>
        <u a="a">
            some text
            <k b="b">some text</k>
            <n a="a">some text</n>
        </u>
        <v a="a">
            some text
            <i a="a">some text</i>
            <q a="a">some text</q>
        </v>
        <v a="a">
            some text
            <i a="a">some different text</i>
            <q a="a">some text</q>
        </v>
        <x a="a" b="b" c="c"/>
    </a>
    <b>
        <h a="a"/>
        <r a="a"/>
    </b>
    <y a="a"/>
</z>

最佳答案

以下是我的建议,旨在展示 deep-equal 和 XSLT 2.0 如何提供帮助:

<xsl:stylesheet
    version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output indent="yes" method="xml" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- identity for most attributes -->
    <xsl:template match="@*">
        <xsl:copy/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:copy>
          <xsl:apply-templates select="@*">
            <xsl:sort select="local-name()"/>
           </xsl:apply-templates>
           <xsl:for-each-group select="node() except (processing-instruction(), comment())" group-adjacent="boolean(self::*)">
             <xsl:choose>
               <xsl:when test="current-grouping-key()">
                 <xsl:apply-templates select="current-group()">
                   <xsl:sort select="local-name()"/>
                 </xsl:apply-templates>
               </xsl:when>
               <xsl:otherwise>
                 <xsl:apply-templates select="current-group()"/>
               </xsl:otherwise>
             </xsl:choose>
           </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>

    <!--
        Elements/attributes to remove.
    -->
    <xsl:template match="@*[normalize-space()='']|info|@file_line_nr|@file_name
                         | *[not(@* | node())]"/>


    <!-- remove (well, don't copy) element nodes which are deep-equal to
         a preceding sibling element 
    -->
    <xsl:template match="*[some $ps in preceding-sibling::* satisfies deep-equal(., $ps)]"/>


</xsl:stylesheet>

关于XSLT 删除任意重复的同级元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18772034/

相关文章:

xslt - 查找最大节点值

java - 如何以编程方式更改 XML 中的值

xslt - 无法通过 xslt 删除其中包含 "xsi:nil"的标签

xslt - xsl:value-of不起作用

xml - 从单个节点中的属性构建 XML 结构

image - umbraco xslt getMedia错误

xml - 使用 XSLT 从 XML 文件中删除所有处理指令

xslt - XPATH选择整个树,仅包括第一棵

c# - 如何以易于阅读的格式显示来自 WriteXml 的 DataTable XML?

xslt - XPATH 轴 - 问题