python - find_all() 仅返回列表的第一项

标签 python beautifulsoup

我在使用 BeautifulSoup 时遇到一些问题,使用 find_all() 方法。我正在尝试获取所有 p 标记之间的文本,但它仅返回列表的第一个元素。实际上列表只有一项。为什么 find_all() 方法只返回一项?

这是我想要提取的代码的一部分:

<div class="post-content">
 <p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.</p>

 <p>You can find it, and use for free <a href="https://deep-image.ai/">HERE</a></p>

 <p><em>The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.</em></p>

 <p>As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.</p>

 <h2 id="what-has-changed">What has changed</h2>

 <p>Here are all the main improvements added to Deep Image 2.0:</p>
</div>

这是我的代码:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://teonite.com/blog/deep-image-2-showcasing-results/').text
soup = BeautifulSoup(source, 'html.parser')

for article in soup.find_all(class_='post-content'):
    print(article.p.text)

感谢您的帮助!

最佳答案

您正在搜索 post-content 类的所有标签。虽然只有一个这样的元素,但 find_all 返回一个包含单个条目的列表。因此,您的 for 循环中只有一次迭代,并且仅打印该迭代中第一个 p 标记的文本。

试试这个:

from bs4 import BeautifulSoup
import requests

html = '''
<div class="post-content">
 <p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.</p>

 <p>You can find it, and use for free <a href="https://deep-image.ai/">HERE</a></p>

 <p><em>The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.</em></p>

 <p>As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.</p>

 <h2 id="what-has-changed">What has changed</h2>

 <p>Here are all the main improvements added to Deep Image 2.0:</p>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
div = soup.find(class_='post-content')
for p in div.find_all('p'):
    print(p.text)

您将获得 p 标记内所有文本的所需输出,因为我们现在搜索具有 post-content 类的元素,然后搜索所有 p 此元素内的标签。

关于python - find_all() 仅返回列表的第一项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57310276/

相关文章:

python - 如何解压缩迭代器?

python - 替换文件中的单词

python - 有没有更简单的方法将 xml 文件解析为嵌套数组?

python - 列表索引超出范围错误: webscraping with Beautifoul Soup

python - 通过文本 beautifulsoup 从第二个 div 中提取文本

python - 在 python 中使用雅虎帐户发送电子邮件

python - 如何让我的 Heroku 部署的 Django 支持的项目保持干爽?

Python:定义数字列表的方差

pandas - BeautifulSoup 表到数据框

python - BeautifulSoup 标签去除