python - 如何使用beautifulsoup获取html中的类内容？

标签 python html web-scraping beautifulsoup html-parsing

这是我希望使用的 html 代码:

<section id='price'>

<div class="row">
    <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
    <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
    <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
</div>

我的问题是如何从“class='col-sm-4'”获取市值、当前价格、账面值(value)。

因为如果我尝试:

print soup.row.col-sm-4.fa.fa-inr

它不起作用。我对 python 和网络抓取有点陌生，所以请耐心地完成这个过程。提前致谢。

最佳答案

您可以通过文本查找标签并获取 next_element :

from bs4 import BeautifulSoup

data = """
<div class="row">
        <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
        <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
        <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
    </div>
"""
soup = BeautifulSoup(data)

titles = ['Market Cap', 'Current Price', 'Book Value']
for title in titles:
    print soup.find(text=lambda x: x.startswith(title)).next_element.text

打印:

10.64 Crores
35.35
53.52

要获取浮点值，您只需按空格分割并获取第一个元素:

price = soup.find(text=lambda x: x.startswith(title)).strip().split()[0]
print float(price)

<小时/>

您还可以通过 CSS Selector 获取它们:

for item in soup.select('section#price div.row h4.col-sm-4 b'):
    print item.text

关于python - 如何使用beautifulsoup获取html中的类内容？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27917477/

上一篇：python正则表达式在复杂模式中搜索

下一篇：从 init.d 脚本执行时 Python Popen 进程不工作

相关文章：

python - 将具有多个数据部分的 csv 文件读取到可寻址结构中

python - 如何在 Twisted 线程外处理数据

javascript - Konva中如何设置初始阶段颜色？

python - Scraper Python和YouTube API

python - 无法强制 scrapy 使用重定向的 url 进行回调

不同维度的Python zip numpy数组

python - 如何注释pydoc的参数

javascript - 有没有一种使用 JQuery 向 div 添加 dentry 的简单方法？

php - 如何使用php从mysql数据库获取数据到html <ul> <li> </li> </ul>列表

R - 如何使用 rvest 或 rcurl 单击网页