我在使用 BeautifulSoup 时遇到一些问题,使用 find_all() 方法。我正在尝试获取所有 p
标记之间的文本,但它仅返回列表的第一个元素。实际上列表只有一项。为什么 find_all() 方法只返回一项?
这是我想要提取的代码的一部分:
<div class="post-content">
<p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.</p>
<p>You can find it, and use for free <a href="https://deep-image.ai/">HERE</a></p>
<p><em>The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.</em></p>
<p>As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.</p>
<h2 id="what-has-changed">What has changed</h2>
<p>Here are all the main improvements added to Deep Image 2.0:</p>
</div>
这是我的代码:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://teonite.com/blog/deep-image-2-showcasing-results/').text
soup = BeautifulSoup(source, 'html.parser')
for article in soup.find_all(class_='post-content'):
print(article.p.text)
感谢您的帮助!
最佳答案
您正在搜索 post-content
类的所有标签。虽然只有一个这样的元素,但 find_all
返回一个包含单个条目的列表。因此,您的 for
循环中只有一次迭代,并且仅打印该迭代中第一个 p
标记的文本。
试试这个:
from bs4 import BeautifulSoup
import requests
html = '''
<div class="post-content">
<p>If you’re not familiar with Deep Image, it’s an amazing tool which allows you to increase the size of an image and upgrade its quality at the same time.</p>
<p>You can find it, and use for free <a href="https://deep-image.ai/">HERE</a></p>
<p><em>The goal of this blog post is to focus on the main changes and showcase the results of DI 2.0 algorithms.</em></p>
<p>As we all know a picture is worth a thousand words. So we will let the enhanced pictures speak for themselves. All pictures you can see below were processed using Deep Image algorithms.</p>
<h2 id="what-has-changed">What has changed</h2>
<p>Here are all the main improvements added to Deep Image 2.0:</p>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
div = soup.find(class_='post-content')
for p in div.find_all('p'):
print(p.text)
您将获得 p
标记内所有文本的所需输出,因为我们现在搜索具有 post-content
类的元素,然后搜索所有 p
此元素内的标签。
关于python - find_all() 仅返回列表的第一项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57310276/