web-scraping - 如何限制BeautifulSoup找到的元素数量？

在使用 BeautifulSoup 抓取网页时，是否有办法限制 find 方法系列找到的元素数量。

例如，如果我只想要前 5 个标签，我可以使用 BeautifulSoup 来做到这一点吗？

最佳答案

.find_all() 和 .select() 返回标准 python 列表，因此您可以使用例如 [:5] 来获取仅前 5 个结果:

from bs4 import BeautifulSoup

txt = '''
<div>Tag 1</div>
<div>Tag 2</div>
<div>Tag 3</div>
<div>Tag 4</div>
<div>Tag 5</div>
<div>Tag 6</div>
<div>Tag 7</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

for div in soup.find_all('div')[:5]:
    print(div.text)

打印:

Tag 1
Tag 2
Tag 3
Tag 4
Tag 5

编辑:您可以使用 CSS 选择器来选择前 5 个元素:

from bs4 import BeautifulSoup

txt = '''
<div>Tag 1</div>
<div>Tag 2</div>
<div>Tag 3</div>
<div>Tag 4</div>
<div>Tag 5</div>
<div>Tag 6</div>
<div>Tag 7</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

for div in soup.select('div:nth-of-type(-n+5)'):
    print(div.text)

打印:

Tag 1
Tag 2
Tag 3
Tag 4
Tag 5

关于web-scraping - 如何限制BeautifulSoup找到的元素数量？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62282157/

上一篇：tensorflow - 带有lstm的蒙特卡罗经常性辍学

下一篇：c# - 基于非主键的EF核心查询

python - 如何使用 BeautifulSoup 从一层获取文本？

python - 无法使用我的抓取工具中的方法生成的链接

python - Mechanize 打开多个页面

c# - 在 .NET 和 C# 中从网站提取数据时出现问题

python - 查找与 Beautiful Soup 的特定链接

python - python 列表中的值被替换

python - 使用 Python 进行抓取。无法获取想要的数据

python - 如何转义实际上名为 <parent> 的 BeautifulSoup ISO 标签中的父属性？

python - 使用 beautifulsoup 从 craigslist 获取价格