python - 在 python 中解析 xml

标签 python xml pandas

我希望通过以下 xml 进行解析- http://charts.realclearpolitics.com/charts/1044.xml .我想在一个包含 3 列的数据框中得到结果:日期、批准、不批准。 xml 文件是动态的,因为每天都会添加一个新日期,因此代码应该考虑到这一点。我已经实现了一个静态的解决方案,即我必须循环给出值标签行号。我想学习如何动态地实现它。

import numpy as np
import pandas as pd
import requests
from pattern import web

xml = requests.get('http://charts.realclearpolitics.com/charts/1044.xml').text
dom = web.Element(xml)
values = dom.by_tag('value')

date = []
approve = []
disapprove = []

values = dom.by_tag('value')
#The last range number below is 1720 instead of 1727 as last 6 values of Approve & Disapprove tag are blank. 
for i in range(0,1720):
    date.append(pd.to_datetime(values[i].content))

#The last range number below is 3447 instead of 3454 as last 6 values are blank. Including till 3454 will give error while converting to float. 
for i in range(1727,3447):
    a = float(values[i].content)
    approve.append(a)

#The last range number below is 5174 instead of 5181 as last 6 values are blank.
for i in range(3454,5174):
    a = float(values[i].content)
    disapprove.append(a)

finalresult = pd.DataFrame({'date': date, 'Approve': approve, 'Disapprove': disapprove})
finalresult

最佳答案

这是使用 lxml 实现的一种方法和 XPath:

from lxml import etree
import pandas as pd

tree = etree.parse("http://charts.realclearpolitics.com/charts/1044.xml")

date = [s.text for s in tree.xpath("series/value")]
approve = [float(s.text) if s.text else 0.0
           for s in tree.xpath("graphs/graph[@title='Approve']/value")]
disapprove = [float(s.text) if s.text else 0.0
              for s in tree.xpath("graphs/graph[@title='Disapprove']/value")]

assert len(date) == len(approve) == len(disapprove)

finalresult = pd.DataFrame({'Date': date, 'Approve': approve, 'Disapprove': disapprove})
print finalresult

输出:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1727 entries, 0 to 1726
Data columns (total 3 columns):
Date          1727  non-null values
Approve       1727  non-null values
Disapprove    1727  non-null values
dtypes: float64(2), object(1)

关于python - 在 python 中解析 xml,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19343016/

相关文章:

xml - "The file.docx cannot be opened because there are problems with the contents."位置部分 :/word/document. xml

Python - 如何扩展 Pandas 数据框的行以包含键列的所有值组合?

python - astype ('float' ) 更改数据,而不仅仅是数据类型

python - pip install pkg 给出权限被拒绝 :/Library/Python/2. 7/site-packages/pkg

python - 通过 systemd 运行 Python 脚本无法加载模块

python - 如何在 django 中获取时区感知日期?

python - 通过某种内存提高 BFS 性能

java - 使用 jaxb 在 java 中持久保存独立程序的数据

xml - Powershell xml变量编码

python - 为 Pandas 数据框的单独列(来自特定列范围)的最大值选择相应的列值