python - 如何在 python 中解析具有多个嵌套子项的 .xml 文件？

我正在使用 python 解析一个非常复杂的 .xml 文件，因为它有很多嵌套的子项；访问其中包含的某些值非常烦人，因为代码开始变得非常难看。

首先让我向您介绍.xml 文件:

<?xml version="1.0" encoding="utf-8"?>
<Start>
  <step1 stepA="5" stepB="6" />
  <step2>
    <GOAL1>11111</GOAL1>
    <stepB>
      <stepBB>
        <stepBBB stepBBB1="pinco">1</stepBBB>
      </stepBB>
      <stepBC>
        <stepBCA>
          <GOAL2>22222</GOAL2>
        </stepBCA>
      </stepBC>
      <stepBD>-NO WOMAN NO CRY                                            
              -I SHOT THE SHERIF                                                           
              -WHO LET THE DOGS OUT
      </stepBD>
    </stepB>
  </step2>
  <step3>
    <GOAL3 GOAL3_NAME="GIOVANNI" GOAL3_ID="GIO">
      <stepB stepB1="12" stepB2="13" />
      <stepC>XXX</stepC>
      <stepC>
        <stepCC>
          <stepCC GOAL4="saf12">33333</stepCC>
        </stepCC>
      </stepC>
    </GOAL3>
  </step3>
  <step3>
    <GOAL3 GOAL3_NAME="ANDREA" GOAL3_ID="DRW">
      <stepB stepB1="14" stepB2="15" />
      <stepC>YYY</stepC>
      <stepC>
        <stepCC>
          <stepCC GOAL4="fwe34">44444</stepCC>
        </stepCC>
      </stepC>
    </GOAL3>
  </step3>
</Start>

我的目标是以一种比我在下面的示例代码中编写的方法更好的方式访问名为“GOAL”的子项中包含的值。此外，我想找到一种自动方法来查找具有属于具有相同名称的不同 child 的相同类型标签的 GOALS 的值:

示例:GIOVANNI 和 ANDREA 都属于同一类型的标签 (GOAL3_NAME)，并且属于具有相同名称 (<step3>) 的不同 child 。

这是我写的代码:

import xml.etree.ElementTree as ET
data = ET.parse('test.xml').getroot()

GOAL1 = data.getchildren()[1].getchildren()[0].text
print(GOAL1)

GOAL2 = data.getchildren()[1].getchildren()[1].getchildren()[1].getchildren()[0].getchildren()[0].text
print(GOAL2)

GOAL3 = data.getchildren()[2].getchildren()[0].text
print(GOAL3)

GOAL4_A = data.getchildren()[2].getchildren()[0].getchildren()[2].getchildren()[0].getchildren()[0].text
print(GOAL4_A)

GOAL4_B = data.getchildren()[3].getchildren()[0].getchildren()[2].getchildren()[0].getchildren()[0].text
print(GOAL4_B)

我得到的输出如下:

我想要的输出应该是这样的:

11111 
22222
GIOVANNI
33333
ANDREA
44444

如你所见，我能够阅读 GOAL1和 GOAL2很容易，但我正在寻找更好的代码实践来访问这些值，因为在我看来它太长且难以阅读/理解。

我想做的第二件事是获取 GOAL3和 GOAL4以一种自动化的方式，这样我就不必重复类似的代码行，并使其更具可读性和可理解性。

注意:如您所见，我无法读取 GOAL3 .如果可能的话，我想同时获得 GOAL3_NAME和 GOAL3_ID

为了使 .xml 文件结构更易于理解，我张贴了它的外观图片:

突出显示的元素是我要查找的内容。

最佳答案

这是使用递归方法和 cElementTree(快 15-20 倍)从头到尾迭代的简单示例，您可以从中收集所需的信息

import xml.etree.cElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
def get_tail(root):
    for child in root:
        print child.text
        get_tail(child)
get_tail(root)

关于python - 如何在 python 中解析具有多个嵌套子项的 .xml 文件？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40628499/

python - 如何在 python 中解析具有多个嵌套子项的 .xml 文件？

上一篇：python - 存储一对需要在 Python 中经常更新的值的最佳方法？

下一篇：python - 根据开始和结束日期按组扩展行