我正处于将基于 Word 的文档转换为 XML 的非常痛苦的过程中。我遇到了以下问题:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">Is this a
quote</hi>?” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is a
quote</hi>” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is
definitely a quote</hi>!” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text.„<hi rend="italics">This is a
first quote</hi>” (Source). „<hi rend="italics">Sometimes there is a second quote as
well</hi>!?” (Source). </p>
</root>
<p>
节点具有混合内容。 <element>
我在之前的迭代中已经处理过了。但现在的问题是引用和来源部分出现在 <hi rend= "italics"/>
中。部分作为文本节点。
我如何使用 XSLT 2.0 来:
- 匹配所有
<hi rend="italics">
紧接在最后一个字符为“„”的文本节点之前的节点? - 输出
<hi rend="italics">
的内容作为<quote>...</quote>
, 去掉引号(“„”和“”),但包含在<quote/>
内紧随<hi rend="italics">
的 sibling 之后出现的任何问题和感叹号? - 在
<hi rend="italics">
之后的“(”和“)”之间转换文本节点节点为<source>...</source>
没有括号。 - 包括最后的句号。
换句话说,我的输出应该是这样的:
<root>
<p>
<element>This one is taken care of.</element> Some more text. <quote>Is this a quote?</quote> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a quote</hi> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is definitely a quote!</hi> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a first quote</quote> <source>Source</source>. <quote>Sometimes there is a second quote as well!?</quote> <source>Source</source>.
</p>
</root>
我从来没有处理过像这样的混合内容和字符串操作,整个事情真的让我失望。我将非常感谢您的提示。
最佳答案
这个转换:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"hi[@rend='italics'
and
preceding-sibling::node()[1][self::text()[ends-with(., '„')]]
]">
<quote>
<xsl:value-of select=
"concat(.,
if(matches(following-sibling::text()[1], '^[?!]+'))
then replace(following-sibling::text()[1], '^([?!]+).*$', '$1')
else()
)
"/>
</quote>
</xsl:template>
<xsl:template match="text()[true()]">
<xsl:variable name="vThis" select="."/>
<xsl:variable name="vThis2" select="translate($vThis, '„”?!', '')"/>
<xsl:value-of select="substring-before(concat($vThis2, '('), '(')"/>
<xsl:if test="contains($vThis2, '(')">
<source>
<xsl:value-of select=
"substring-before(substring-after($vThis2, '('), ')')"/>
</source>
<xsl:value-of select="substring-after($vThis2, ')')"/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
应用于提供的 XML 文档时:
<root>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">Is this a
quote</hi>?” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is a
quote</hi>” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is
definitely a quote</hi>!” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text.„<hi rend="italics">This is a
first quote</hi>” (Source). „<hi rend="italics">Sometimes there is a second quote as
well</hi>!?” (Source). </p>
</root>
产生想要的、正确的结果:
<root>
<p>
<element>This one is taken care of.</element> Some more text. <quote>Is this a
quote?</quote> <source>Source</source>. </p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a
quote</quote> <source>Source</source>. </p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is
definitely a quote!</quote> <source>Source</source>. </p>
<p>
<element>This one is taken care of.</element> Some more text.<quote>This is a
first quote</quote> <source>Source</source>. <quote>Sometimes there is a second quote as
well!?</quote> <source>Source</source>. </p>
</root>
关于xml - 混合内容和字符串操作清理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12690177/