python - 使用 BeautifulSoup 提取相似的 XML 属性

假设我有以下 XML:

<time from="2017-07-29T08:00:00" to="2017-07-29T09:00:00">
    <!-- Valid from 2017-07-29T08:00:00 to 2017-07-29T09:00:00 -->
    <symbol number="4" numberEx="4" name="Cloudy" var="04"/>
    <precipitation value="0"/>
    <!-- Valid at 2017-07-29T08:00:00 -->
    <windDirection deg="300.9" code="WNW" name="West-northwest"/>
    <windSpeed mps="1.3" name="Light air"/>
    <temperature unit="celsius" value="15"/>
    <pressure unit="hPa" value="1002.4"/>
</time>
<time from="2017-07-29T09:00:00" to="2017-07-29T10:00:00">
    <!-- Valid from 2017-07-29T09:00:00 to 2017-07-29T10:00:00 -->
    <symbol number="4" numberEx="4" name="Partly cloudy" var="04"/>
    <precipitation value="0"/>
    <!-- Valid at 2017-07-29T09:00:00 -->
    <windDirection deg="293.2" code="WNW" name="West-northwest"/>
    <windSpeed mps="0.8" name="Light air"/>
    <temperature unit="celsius" value="17"/>
    <pressure unit="hPa" value="1002.6"/>
</time>

而我想从中收集time from、symbol name和temperature value，然后按如下方式打印出来: time from: symbol name, tempraure value -- 像这样:2017-07-29, 08:00:00: Cloudy, 15°。

(如您所见，此 XML 中有一些 name 和 value 属性。)

到目前为止，我的方法非常简单:

#!/usr/bin/env python
# coding: utf-8

import re
from BeautifulSoup import BeautifulSoup

# data is set to the above XML
soup = BeautifulSoup(data)
# collect the tags of interest into lists. can it be done wiser?
time_l = []
symb_l = []
temp_l = []
for i in soup.findAll('time'):
    i_time = str(i.get('from'))
    time_l.append(i_time)
for i in soup.findAll('symbol'):
    i_symb = str(i.get('name'))
    symb_l.append(i_symb)
for i in soup.findAll('temperature'):
    i_temp = str(i.get('value'))
    temp_l.append(i_temp)
# join the forecast lists to a dict
forc_l = []
for i, j in zip(symb_l, temp_l):
    forc_l.append([i, j])
rez = dict(zip(time_l, forc_l))
# combine and format the rezult. can this dict be printed simpler?
wew = ''
for key in sorted(rez):
    wew += re.sub("T", ", ", key) + str(rez[key])
wew = re.sub("'", "", wew)
wew = re.sub("\[", ": ", wew)
wew = re.sub("\]", "°\n", wew)
# print the rezult
print wew

但我想一定有更好、更智能的方法吧？大多数情况下，我对从 XML 中收集属性感兴趣，实际上，我的方式对我来说似乎相当愚蠢。另外，有没有更简单的方法可以很好地打印出字典 {'a': '[b, c]'} ？

如有任何提示或建议，我们将不胜感激。

最佳答案

from bs4 import BeautifulSoup
with open("sample.xml", "r") as f: # opening xml file
    content = f.read() # xml content stored in this variable
soup = BeautifulSoup(content, "lxml")
for values in soup.findAll("time"):
    print("{} : {}, {}°".format(values["from"], values.find("symbol")["name"], values.find("temperature")["value"]))

输出:

2017-07-29T08:00:00 : Cloudy, 15°
2017-07-29T09:00:00 : Partly cloudy, 17°

关于python - 使用 BeautifulSoup 提取相似的 XML 属性，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45386116/

python - 使用 BeautifulSoup 提取相似的 XML 属性

上一篇：android - Activity 的状态栏颜色无明显原因不同

下一篇：php - 警告 'SimpleXMLElement::addChild(): 未终止的实体引用