python - 在 python 中使用 elementtree 提取 XML 节点文本时出错

标签 python xml elementtree

我尝试从特定节点提取文本。我想要从所有 person 获取 id 值和 age。在 person 10 中,年龄将为 30,如带有 name="age" 的属性文本所示。但是,我最终收到一个错误(请参阅下面的代码和产生的错误),不存在任何文本,我不明白为什么。

我之前已经对几乎相同的结构使用了相同的代码,并且它的工作没有任何问题。如果有人能给我提示导致问题的原因,我会非常高兴。

XML 样式:

<population desc="Switzerland Baseline">
   <person id="10">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >30</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="home_x" class="java.lang.Double" >2679482.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1237545.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >374775</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >281604</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000137</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240012081086</attribute>
        </attributes>
        <plan score="-9.025277777777776" selected="yes">
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" end_time="07:50:56" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="100">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >3</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >false</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >true</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >false</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >324961</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >-1</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000049</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240013385042</attribute>
        </attributes>
        <plan score="0.0" selected="no">
            <activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
            </activity>
        </plan>

        <plan score="0.0" selected="yes">
            <activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="1000">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >48</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >yes</attribute>
            <attribute name="home_x" class="java.lang.Double" >2678966.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1235785.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >137604</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >496052</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000745</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240009138483</attribute>
        </attributes>
        <plan score="-437.00166666666667" selected="yes">
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="05:33:00" >
            </activity>
            <leg mode="transit_walk" dep_time="07:15:00" trav_time="00:01:01">
                <route type="generic" start_link="812194" end_link="588385" trav_time="00:01:01" distance="73.45759253010056"></route>
            </leg>
            <activity type="pt interaction" link="588385" x="2682500.5564242266" y="1246491.125064118" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="07:16:01" trav_time="00:13:58">
                <route type="enriched_pt" start_link="588385" end_link="368678" trav_time="00:13:58" distance="8378.187255109851">{"inVehicleTime":420.0,"transferTime":418.7853395582497,"accessStopIndex":4,"egressStopindex":5,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05362"}</route>
            </leg>
            <activity type="pt interaction" link="368678" x="2685173.595399507" y="1238953.4179927576" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="07:30:00" trav_time="00:01:10">
                <route type="generic" start_link="368678" end_link="812077" trav_time="00:01:10" distance="82.96796919207021"></route>
            </leg>
            <activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="15:52:43" >
            </activity>
            <leg mode="outside" dep_time="15:52:43" trav_time="00:00:00">
                <route type="generic" start_link="812077" end_link="812077" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="16:59:00" >
            </activity>
            <leg mode="transit_walk" dep_time="16:59:00" trav_time="01:42:47">
                <route type="generic" start_link="812077" end_link="555704" trav_time="01:42:47" distance="7401.037993401233"></route>
            </leg>
            <activity type="outside" link="555704" facility="outside_7" x="2690699.2533230074" y="1240302.4760125757" end_time="17:07:39" >
            </activity>
            <leg mode="access_walk" dep_time="17:07:39" trav_time="00:33:33">
                <route type="generic" start_link="555704" end_link="348266" trav_time="00:33:33" distance="2415.2684761259893"></route>
            </leg>
            <activity type="pt interaction" link="348266" x="2688841.9870530544" y="1240253.9986282045" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="17:41:12" trav_time="00:10:48">
                <route type="enriched_pt" start_link="348266" end_link="166875" trav_time="00:10:48" distance="3166.770768054601">{"inVehicleTime":420.0,"transferTime":228.0,"accessStopIndex":0,"egressStopindex":10,"transitRouteId":"02828_023","transitLineId":"VZO_line961","departureId":"125106"}</route>
            </leg>
            <activity type="pt interaction" link="166875" x="2687161.005729228" y="1240076.9559941967" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="17:52:00" trav_time="00:00:21">
                <route type="generic" start_link="166875" end_link="771010" trav_time="00:00:21" distance="25.959922652207396"></route>
            </leg>
            <activity type="pt interaction" link="771010" x="2687180.6471416447" y="1240073.3528400902" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="17:52:21" trav_time="00:19:38">
                <route type="enriched_pt" start_link="771010" end_link="955474" trav_time="00:19:38" distance="9742.201043728513">{"inVehicleTime":960.0,"transferTime":218.36673112316203,"accessStopIndex":1,"egressStopindex":7,"transitRouteId":"19622_002","transitLineId":"SBB_S16_8503016-8503103","departureId":"06187"}</route>
            </leg>
            <activity type="pt interaction" link="955474" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="18:12:00" trav_time="00:00:00">
                <route type="generic" start_link="955474" end_link="955504" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="pt interaction" link="955504" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="18:12:00" trav_time="00:07:00">
                <route type="enriched_pt" start_link="955504" end_link="4223" trav_time="00:07:00" distance="3304.5168456795577">{"inVehicleTime":120.0,"transferTime":300.0,"accessStopIndex":2,"egressStopindex":3,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05406"}</route>
            </leg>
            <activity type="pt interaction" link="4223" x="2681934.8161827456" y="1247302.7661533705" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="18:19:00" trav_time="00:00:59">
                <route type="generic" start_link="4223" end_link="586407" trav_time="00:00:59" distance="71.92245024668337"></route>
            </leg>
            <activity type="pt interaction" link="586407" x="2681990.0107938214" y="1247298.9705903793" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="18:19:59" trav_time="01:01:00">
                <route type="enriched_pt" start_link="586407" end_link="617712" trav_time="01:01:00" distance="15771.43292404094">{"inVehicleTime":1920.0,"transferTime":1740.0646247944242,"accessStopIndex":0,"egressStopindex":19,"transitRouteId":"07744_004","transitLineId":"PAG_line236","departureId":"77196"}</route>
            </leg>
            <activity type="pt interaction" link="617712" x="2679299.97008475" y="1237575.0077440983" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="19:21:00" trav_time="00:15:42">
                <route type="generic" start_link="617712" end_link="360294" trav_time="00:15:42" distance="1130.0689845763227"></route>
            </leg>
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="17:53:00" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="1000157">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >52</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_ALL</attribute>
            <attribute name="carAvail" class="java.lang.String" >always</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >yes</attribute>
            <attribute name="home_x" class="java.lang.Double" >2695732.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1259962.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >275258</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >212563</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201202300043212</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240010759877</attribute>
        </attributes>
        <plan score="-1.7305555555555556" selected="yes">
            <activity type="outside" link="557064" facility="outside_8" x="2691803.987049347" y="1253846.2689263367" end_time="07:04:33" >
            </activity>
        </plan>

    </person>
</population>

我的代码:

import xml.etree.ElementTree as ET
import pandas as pd
import gzip


tree = ET.parse(gzip.open('STORAGE/500/1/output_plans.xml.gz', 'r'))

root = tree.getroot()
rows = []
for it in root.iter('person'):
    id = it.attrib['id']
    age = it.find('attributes/attribute[@name="age"]').text 
    rows.append([id, age])

d = pd.DataFrame(rows, columns=['id', 'age'])

错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-badcde9dbf74> in <module>
      8 for it in root.iter('person'):
      9     id = it.attrib['id']
---> 10     age = it.find('attributes/attribute[@name="age"]').text
     11     rows.append([id, age])
     12 

AttributeError: 'NoneType' object has no attribute 'text'

最佳答案

考虑迁移所有属性!

rows = []
for it in root.iter('person'):
    attribute = it.find('attributes')

    id_dict = {'id':it.attrib['id']}
    attrs_dict = {a.attrib['name']:a.text for a in attribute.findall('attribute')}

    # MERGE DICTIONARIES (ONLY WORKS Python 3.5+)
    rows.append({**id_dict, **attrs_dict})

d = pd.DataFrame(rows)

print(d)    
#         id age bikeAvailability carAvail employed  ... ptHasVerbund sex spRegion statpopHouseholdId  statpopPersonId
# 0       10  30         FOR_SOME    never     true  ...        false   f        1    201200010000137  201240012081086
# 1      100   3         FOR_SOME    never    false  ...         true   f        1    201200010000049  201240013385042
# 2     1000  48         FOR_SOME    never     true  ...        false   f        1    201200010000745  201240009138483
# 3  1000157  52          FOR_ALL   always     true  ...         true   f        1    201202300043212  201240010759877

或者使用嵌套列表/字典理解!

attrs_list = [{**{'id':it.attrib['id']}, **{a.attrib['name']:a.text 
                    for a in it.find('attributes').findall('attribute')}} 
                    for it in root.iter('person')]

d = pd.DataFrame(attrs_list)

print(d)
#         id age bikeAvailability carAvail employed hasLicense  ... ptHasStrecke ptHasVerbund sex spRegion statpopHouseholdId  statpopPersonId
# 0       10  30         FOR_SOME    never     true         no  ...        false        false   f        1    201200010000137  201240012081086
# 1      100   3         FOR_SOME    never    false         no  ...         true         true   f        1    201200010000049  201240013385042
# 2     1000  48         FOR_SOME    never     true        yes  ...        false        false   f        1    201200010000745  201240009138483
# 3  1000157  52          FOR_ALL   always     true        yes  ...        false         true   f        1    201202300043212  201240010759877

关于python - 在 python 中使用 elementtree 提取 XML 节点文本时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62024004/

相关文章:

python - 多级别的 Groupby 和 Sum

javascript - 如何在javascript中获取xml命名空间

python - 使用 Python ElementTree 迭代多个(父,子)节点

python - 在 .svg xml 中按 ID 选择元素

python - Django:反向 'delete',参数 '(49,)' 和关键字参数 '{}' 未找到。尝试了 1 个模式:['tidbit/delete_tidbit/' ]

python - 回归汇总输出 : Order of categories

python - 如何从 dtype 为字符串的 tf.tensor 中获取字符串值

android - 用于 ListView 的 Google Now 卡片 UI

PHP SOAP 响应包含具有不同 xmlns 的属性并显示为空

python - 使用python重复查询xml