<td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" >
<div class="inner">
<div class="item">
<div class="view-item view-item-aisd_calendar">
<div class="calendar monthview">
<div class="calendar.4168.field_date.8.0 contents">
<a href="/event/2013/regular-board-meeting">**Regular Board Meeting**</a> <span class="date-display-single">7:00 pm</span> </div>
<div class="cutoff"> </div>
</div>
</div>
</div> </div>
</td>
我有上面的 HTML 代码。我想从上面提取“date”标签(2014-04-28)和“a href”标签(定期董事会 session )。我如何使用 Python 来做到这一点?这可以使用 Beautiful Soup 来完成吗?
最佳答案
以下是通过 BeautifulSoup
实现此操作的方法:
from bs4 import BeautifulSoup
data = """
<html>
<body>
<td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" >
<div class="inner">
<div class="item">
<div class="view-item view-item-aisd_calendar">
<div class="calendar monthview">
<div class="calendar.4168.field_date.8.0 contents">
<a href="/event/2013/regular-board-meeting">**Regular Board Meeting**</a> <span class="date-display-single">7:00 pm</span> </div>
<div class="cutoff"> </div>
</div>
</div>
</div> </div>
</td>
</body>
</html>
"""
soup = BeautifulSoup(data)
td = soup.body.td # or soup.find('td', id='aisd_calendar-2014-04-28-0')
print td['date'].strip('*')
link = soup.find('div', {'class': 'contents'}).a
print link['href']
打印:
2014-04-28
/event/2013/regular-board-meeting
另外,如果需要将日期转换为python的datetime
,可以使用 strptime() :
from datetime import datetime
...
datetime.strptime(td['date'].strip('*'), '%Y-%m-%d')
希望有帮助。
关于python - 在 Python 中导航 HTML 树,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22607077/