python - 在 Python 中使用 lxml 遍历 XML 的最快/最佳方法

我有一个如下所示的 XML 文件:

xml = '''<?xml version="1.0"?>
        <root>
            <item>text</item>
            <item2>more text</item2>
            <targetroot>
                <targetcontainer>
                    <target>text i want to get</target>
                </targetcontainer>
                <targetcontainer>
                    <target>text i want to get</target>
                </targetcontainer>
            </targetroot>
            ...more items
        </root>
'''

使用 lxml，我尝试访问元素中的文本。我已经找到了解决方案，但我确信有更好、更有效的方法来做到这一点。我的解决方案:

target = etree.XML(xml)

for x in target.getiterator('root'):
    item1 = x.findtext('item')
    for target in x.iterchildren('targetroot'):
        for t in target.iterchildren('targetcontainer'):
            targetText = t.findtext('target')

虽然这有效，因为它使我可以访问根中的所有元素以及目标元素，但我很难相信这是最有效的解决方案。

所以我的问题是:是否有一种更有效的方法来访问的文本，同时保留在根循环中，因为我还需要访问其他元素。

最佳答案

您可以使用XPath :

for x in target.xpath('/root/targetroot/targetcontainer/target'):
    print x.text

我们询问与路径匹配的所有元素。在本例中，路径为 /root/targetroot/targetcontainer/target ，这意味着

all the <target> elements that are inside a <targetcontainer> element, inside a <targetroot> element, inside a <root> element. Also, the <root> element should be the document root because it is preceded by /, which means the beginning of the document.

此外，您的 XML 文档还有两个问题。一、<?xml version="1.0"?>声明应该是文档中的第一件事 - 在这个例子中，它前面有一个换行符和一些空格。另外，它不是标签，不应关闭，因此 </xml>字符串末尾的内容应该被删除。无论如何，我已经编辑了你的问题。

编辑:这个解决方案还可以改进。您不需要传递所有路径 - 您只需询问所有元素 <target>文档内。这是通过在标签名称前添加两个斜杠来完成的。因为您想要所有 <target>文本，无论它们在哪里，这可能是一个更好的解决方案。因此，上面的循环可以写成:

for x in target.xpath('//target'):
    print x.text

一开始我尝试过，但没有成功。然而，问题是 XML 中的语法问题，而不是 XPath，但我尝试了另一条更长的路径，但忘记重试此路径。对不起!不管怎样，我希望我能对 XPath 有所了解:)

关于python - 在 Python 中使用 lxml 遍历 XML 的最快/最佳方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8548531/

python - 在 Python 中使用 lxml 遍历 XML 的最快/最佳方法

上一篇：python - Django + uWSGI + 发送电子邮件的神秘问题

下一篇：python - 这样的进口合法还是不推荐？