Python:如果在没有命名空间的情况下添加新元素，则 xpath.find() 将找不到新元素

今天我偶然发现了 xml.dom 和 xpath 模块的一种特殊行为，我花了一段时间才弄清楚它与 XML namespace 有关:

from xml.dom import minidom
import xpath

zooXml = """<?xml version="1.0" encoding="utf-8"?>
<Zoo xmlns='http://foo.bar/zoo'>
  <Compound><Chimp/></Compound>
</Zoo>"""

mydom = minidom.parseString(zooXml)
compound = xpath.findnode('/Zoo/Compound', mydom)
print compound.toxml() # as expected: <Compound><Chimp/></Compound>
print xpath.find("Chimp", compound) # as expected: [<DOM Element: Chimp at 0x24c0cc8>]

到目前为止一切顺利，但如果我现在添加另一个 Chimp 元素而不明确指定其 namespace ，xpath 将找不到新元素:

newChimp = mydom.createElement("Chimp")
compound.appendChild(newChimp)
print compound.toxml() # ok, two chimps now: <Compound><Chimp/><Chimp/></Compound>
print xpath.find("Chimp", compound) # wait a second, that's still only one chimp: [<DOM Element: Chimp at 0x24a0d88>]

重新解析修改后的XML后，xpath会找到这两个元素:

mydom = minidom.parseString(mydom.toxml())
compound = xpath.findnode('/Zoo/Compound', mydom)
print xpath.find("Chimp", compound) # now it finds both chimps: [<DOM Element: Chimp at 0x24c9808>, <DOM Element: Chimp at 0x24c9888>]

此外，如果我使用命名空间创建新元素，xpath 将在不重新解析的情况下找到它们:

babyChimp = mydom.createElementNS(mydom.firstChild.namespaceURI, "Chimp")
compound.appendChild(babyChimp)
print xpath.find("Chimp", compound) # that worked: [<DOM Element: Chimp at 0x24c9808>, <DOM Element: Chimp at 0x24c9888>, <DOM Element: Chimp at 0x24c9548>]

问题是:这种行为是正确的还是错误？ Chimp 的命名空间不应该是隐式的吗？毕竟，无论我使用 xml.dom.createElement() 还是 xml.dom.createElementNS()，生成的 XML 都是一样的。如果这是一个错误，那么它在哪里？在 xml.dom 或 xpath 中？

FWIW:我在 Python 2.7.5 和 2.7.4 的 Windows 发行版中观察到了这种行为，在这两种情况下我都使用了 xpath。模块 0.1。

最佳答案

简要说明:

The question is: is this behaviour correct or is it a bug?

该行为似乎不正确或至少是不需要的，所以我会说它是错误或未完成的功能。

Shouldn't the namespace of Chimp be implicit?

在解析字符串时它是隐式的，但在创建该元素并将该元素添加到树时它不是隐式的。

After all, the XML generated will be the same, regardless of whether I use xml.dom.createElement() or xml.dom.createElementNS().

生成的 XML 不相同，或者至少不应该相同。

And if this is a bug, then where is it?

我猜 toxml() 方法中存在错误，它不输出第二个 Chimp 元素的空默认命名空间，因此它继承了重新解析修改后的文档时 Zoo 元素的默认命名空间。

更准确地说:

首先你的文档是

<Zoo xmlns='http://foo.bar/zoo'>
  <Compound><Chimp/></Compound>
</Zoo>

然后您尝试找到 Chimp 元素，但您没有为它定义任何命名空间上下文。然而，在这种情况下，xpath 模块似乎从节点的文档元素初始化上下文，因此查找成功。

然后您添加另一个 Chimp 元素，但没有命名空间，您的文档变为

<Zoo xmlns='http://foo.bar/zoo'>
  <Compound><Chimp/><Chimp xmlns=""/></Compound>
</Zoo>

第二次查找尝试仅返回一个元素，因为另一个具有不同的命名空间。

我猜这个错误发生在这里，而不是上面的正确版本，toxml() 方法的输出实际上是

<Zoo xmlns='http://foo.bar/zoo'>
  <Compound><Chimp/><Chimp/></Compound>
</Zoo>

重新解析该文本会导致第二个 Chimp 元素继承与第一个元素相同的(默认)命名空间，因此您的第三个查找返回 2 个元素。

添加最后一个 Chimp 元素会使文档变成

<Zoo xmlns='http://foo.bar/zoo'>
  <Compound><Chimp/><Chimp/><Chimp xmlns='http://foo.bar/zoo'/></Compound>
</Zoo>

完全一样

<Zoo xmlns='http://foo.bar/zoo'>
  <Compound><Chimp/><Chimp/><Chimp/></Compound>
</Zoo>

同样，所有 Chimp 元素都在同一个命名空间中，因此 find 会返回它们。

这个错误似乎是相关的:http://bugs.python.org/issue1371937尽管它处于状态，但它实际上仍然处于打开状态，因为它作为另一个打开的错误的副本而关闭。

关于Python:如果在没有命名空间的情况下添加新元素，则 xpath.find() 将找不到新元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16980521/

Python:如果在没有命名空间的情况下添加新元素，则 xpath.find() 将找不到新元素

简要说明:

更准确地说:

上一篇：android - 不引用 xml 的 ArrayAdapter

下一篇：java - 从 ArrayList 写入 XML 文件 - 错误输出