python - 在 Python 中从 amara 切换到 lxml

我正在尝试使用 lxml 库完成这样的事情: http://www.xml.com/pub/a/2005/01/19/amara.html

from amara import binderytools

container = binderytools.bind_file('labels.xml')
for l in container.labels.label:
    print l.name, 'of', l.address.city

但我经历了最难让自己感觉湿透的时候!我想要做的是:下降到名为“X”的根节点，然后下降到名为“Y”的第二个子节点，然后获取其所有子节点“名为 Z”，然后仅保留具有属性的子节点将“name”设置为“bacon”，然后对于每个剩余节点，查看其名为“W”的所有子节点，并仅保留基于某个过滤器的子集，该子集查看 W 的唯一名为 A、B 和 C 的子节点。然后我需要使用以下(未优化的)伪代码来处理它们:

result = []
X = root(doc(parse(xml_file_name)))
Y = X[1] # Second child
Zs = Y.children()
for Z in Zs:
    if Z.name != 'bacon': continue # skip
    Ws = Z.children()
    record = []
    assert(len(Ws) == 9)
    W0 = Ws[0]
    assert(W0.A == '42')
    record.append(str(W0.A) + " " + W0.B + " " + W0.C))
    ...
    W1 = Ws[1]
    assert(W1.A == '256')
    ...
    result.append(record)

这就是我想要实现的目标。在尝试使代码更清晰之前，我想让它工作。

请帮忙，因为我迷失在这个 API 中。如果您有疑问，请告诉我。

最佳答案

import lxml.etree as le
import io

content='''\
<foo><X><Y>skip this</Y><Y><Z name="apple"><W>not here</W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
</Y></X></foo>
'''
doc=le.parse(io.BytesIO(content))
# print(le.tostring(doc, pretty_print=True))
result=[]
Zs=doc.xpath('//X/Y[2]/Z[@name="bacon"]')
for Z in Zs:
    Ws=Z.xpath('W')
    record=[]
    assert(len(Ws)==2)  #<--- Change to 9        
    abc=Ws[0].xpath('descendant::text()')
    # print(abc)
    # ['42', 'b', 'c']
    assert(abc[0] == '42')
    record.append(' '.join(abc))
    abc=Ws[1].xpath('descendant::text()')    
    assert(abc[0] == '256')
    result.append(record)
print(result)
# [['42 b c'], ['42 b c']]

这可能是一种收紧内部循环的方法，尽管我只是猜测您希望保留哪些记录:

for Z in Zs:
    Ws=Z.xpath('W')
    assert(len(Ws)==2)  #<--- Change to 9
    a_vals=('42','256')
    for W,a_val in zip(Ws,a_vals):
        abc=W.xpath('descendant::text()')
        assert(abc[0] == a_val)
        result.append([' '.join(abc)])
print(result)
# [['42 b c'], ['256 b c'], ['42 b c'], ['256 b c']]

关于python - 在 Python 中从 amara 切换到 lxml，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4251750/

python - 在 Python 中从 amara 切换到 lxml

上一篇：c# - 想要写我自己的 'Application Whitelisting Tool' 像 Bit9 这样的东西吗？

下一篇：php - Python smtplib 比 PHP mail() 慢