python - 如何使用属性获取lxml中所有元素的路径

标签 python xml lxml

我有以下代码:

tree = etree.ElementTree(new_xml)
for e in new_xml.iter():
    print tree.getpath(e), e.text

这会给我类似以下内容:

/Item/Purchases 

/Item/Purchases/Purchase[1] 
/Item/Purchases/Purchase[1]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[1]/Rating R

/Item/Purchases/Purchase[2] 
/Item/Purchases/Purchase[2]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[2]/Rating R

但是,我需要获取的不是列表元素的路径,而是属性的路径。 xml 如下所示:

<Item>
  <Purchases>
     <Purchase Country="US">
      <URL>http://tvgo.xfinity.com/watch/x/6091165US</URL>
      <Rating>R</Rating>
    </Purchase>
     <Purchase Country="CA">
      <URL>http://tvgo.xfinity.com/watch/x/6091165CA</URL>
      <Rating>R</Rating>
    </Purchase>
</Item>

我如何获得以下路径?

/Item/Purchases 

/Item/Purchases/Purchase[@Country="US"]
/Item/Purchases/Purchase[@Country="US"]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[@Country="US"]/Rating R

/Item/Purchases/Purchase[@Country="CA"]
/Item/Purchases/Purchase[@Country="CA"]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[@Country="CA"]/Rating R

最佳答案

不太漂亮,但它能完成工作。

replacements = {}

for e in tree.iter():
    path = tree.getpath(e)

    if re.search('/Purchase\[\d+\]$', path):
        new_predicate = '[@Country="' + e.attrib['Country'] + '"]'
        new_path = re.sub('\[\d+\]$', new_predicate, path)
        replacements[path] = new_path

    for key, replacement in replacements.iteritems():
        path = path.replace(key, replacement)

    print path, e.text.strip()

为我打印这个:

/Item 
/Item/Purchases 
/Item/Purchases/Purchase[@Country="US"] 
/Item/Purchases/Purchase[@Country="US"]/URL http://tvgo.xfinity.com/watch/x/6091165US
/Item/Purchases/Purchase[@Country="US"]/Rating R
/Item/Purchases/Purchase[@Country="CA"] 
/Item/Purchases/Purchase[@Country="CA"]/URL http://tvgo.xfinity.com/watch/x/6091165CA
/Item/Purchases/Purchase[@Country="CA"]/Rating R

关于python - 如何使用属性获取lxml中所有元素的路径,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38936531/

相关文章:

python - Django Haystack - 无法构建 solr 架构

Python 动态装饰器——为什么有这么多包装?

Java XPath : Get all the elements that match a query

xml - XSL+XPATH : Compare previous node attribute to current node attribute

Python lxml错误 "namespace not defined."

python - 如何使用pip在不同版本的python中安装lxml?

python - 如何使用 xml.etree.ElementTree 编写 XML 声明

python - simplexml_load_string 等效于 Python/Django

python - 在 XML 中保存元组列表

python - 删除节点lxml python