我需要转换具有嵌套(分层)结构的大型 XML 文件
<Root>
Flat XML
Hierarchical XML (multiple blocks, some repetitive)
Flat XML
</Root>
变成更扁平(“切碎”)的形式,每个重复的嵌套 block 有 1 个 block 。
数据有许多不同的标签和层次结构变化(尤其是在分层 XML 之前和之后的分解 XML 的标签数量),因此理想情况下不应对标签和属性名称或层次结构级别做出任何假设。
只有 4 个级别的层次结构的顶层 View 看起来像
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
然后所需的输出将是
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
也就是说,如果在每个级别i
有Li
不同的成分,一共Product(Li)
将生产不同的组件(仅以上 2 个,因为唯一的区分因素是级别 4,所以 L1*L2*L3*L4 = 2
)。
据我所知,XSLT 可能是可行的方法,但任何其他解决方案(例如 StAX 甚至 JDOM)都可以。
一个更详细的例子,使用虚构的信息,将是
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
上面的数据应该被切碎成 5 个 block (即,每个 block 对应一个不同的 <Job>
block ),每个 block 都将保持所有其他标签相同,只有一个 <Job>
。元素。所以,给定 5 个不同的 <Job>
在上面的示例中,转换后的(“分解的”)XML 将是
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
最佳答案
根据要求,这是一个通用的解决方案:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pLeafNodes" select="//Level-4"/>
<xsl:template match="/">
<t>
<xsl:call-template name="StructRepro"/>
</t>
</xsl:template>
<xsl:template name="StructRepro">
<xsl:param name="pLeaves" select="$pLeafNodes"/>
<xsl:for-each select="$pLeaves">
<xsl:apply-templates mode="build" select="/*">
<xsl:with-param name="pChild" select="."/>
<xsl:with-param name="pLeaves" select="$pLeaves"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template mode="build" match="node()|@*">
<xsl:param name="pChild"/>
<xsl:param name="pLeaves"/>
<xsl:copy>
<xsl:apply-templates mode="build" select="@*"/>
<xsl:variable name="vLeafChild" select=
"*[count(.|$pChild) = count($pChild)]"/>
<xsl:choose>
<xsl:when test="$vLeafChild">
<xsl:apply-templates mode="build"
select="$vLeafChild
|
node()[not(count(.|$pLeaves) = count($pLeaves))]">
<xsl:with-param name="pChild" select="$pChild"/>
<xsl:with-param name="pLeaves" select="$pLeaves"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates mode="build" select=
"node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
or
.//*[count(.|$pChild) = count($pChild)]
]
">
<xsl:with-param name="pChild" select="$pChild"/>
<xsl:with-param name="pLeaves" select="$pLeaves"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
应用于提供的简化(和通用)XML 文档时:
<Level-1>
...
<Level-2>
...
<Level-3>
...
<Level-4>A</Level-4>
<Level-4>B</Level-4>
...
</Level-3>
...
</Level-2>
...
</Level-1>
产生了想要的、正确的结果:
<Level-1>
...
<Level-2>
...
<Level-3>
<Level-4>A</Level-4>
</Level-3>
...
</Level-2>
...
</Level-1>
<Level-1>
...
<Level-2>
...
<Level-3>
<Level-4>B</Level-4>
</Level-3>
...
</Level-2>
...
</Level-1>
现在,如果我们改变行:
<xsl:param name="pLeafNodes" select="//Level-4"/>
到:
<xsl:param name="pLeafNodes" select="//Job"/>
并将转换应用到 Employee
XML 文档:
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
我们再次得到想要的、正确的结果:
<t>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title="Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title="Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title="Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
</t>
说明:处理是在命名模板 (StructRepro
) 中完成的,并由名为 pLeafNodes
的单个外部参数控制,该参数必须包含其“向上结构”将在结果中重现的所有节点的节点集。
关于java - 在 Java 中通过 XSLT 分解 XML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8548403/