python - beautifulsoup 检索日期

标签 python beautifulsoup html-parsing

我正在尝试从产品页面检索日期:http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set -HOTC4016B1QES/205080371

但是日期隐藏在元信息中,请参见第一行:

<meta itemprop="datePublished" content="2014-11-27" />
</div><div id='80886327' itemprop="review" itemscope itemtype="http://schema.org/Review"><meta itemprop="itemReviewed" content="HUSKY 41 in. 16-Drawer Tool Chest and Cabinet Set" /><span itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">Rated <span itemprop="ratingValue">5</span> out of <span itemprop="bestRating">5</span></span>Â by <span itemprop="author">Razor</span><span itemprop="name"> solid construction
</span><span itemprop="description"> I spent the last month checking and looking at all tool boxes that I could find. Online and at available stores. In comparison to all, this is by far the best deal for the money. Quality, workmanship and construction of this is by far the best for the money. Some I looked at are twice as much money for the same quality... I have had this approx. a month and filled with tools and shop stuff and with the ball bearing drawers loaded, does not make any difference on drawer operation. Granted we still need the test of time..

你们知道如何将这些日期保存到列表中吗?

最佳答案

您可以使用find_all()获取带有 itemprop="datePublished" 的所有 meta 标记:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set-HOTC4016B1QES/205080371'
soup = BeautifulSoup(urllib2.urlopen(url=url))

print [meta.get('content') for meta in soup.find_all('meta', itemprop='datePublished')]

打印:

[
    '2014-11-27', 
    '2014-11-20', 
    '2014-12-15', 
    '2014-10-28', 
    '2014-10-10'
]

或者,使用CSS Selector :

print [meta.get('content') for meta in soup.select('meta[itemprop="datePublished"]')]

关于python - beautifulsoup 检索日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27716466/

相关文章:

.net - 从 HTML 标签汤生成 .NET XmlDocument 的库

python - 通过 pyTelegramBotAPI 在电报机器人中获取照片

python - 关于 Python 的一些基本说明?

python - 如何使用 beautifulsoup python 使用 findall 指定子标签

java - 在 Android 中解析 HTML

python - 在 python 中将 unicode 字符列表转换为希伯来语字符串

python - 获取轮廓Opencv Python内的区域?

Python - 使用可选的键/值参数从 JSON 创建对象

python - 使用 beautifulsoup 和 python 删除某些标签

python - 检索网络抓取的图形信息