我正在尝试抓取此网页上的所有文章:https://www.coindesk.com/category/markets-news/markets-markets-news/markets-bitcoin/
我可以抓取第一篇文章,但需要帮助了解如何跳转到下一篇文章并抓取那里的信息。预先感谢您的支持。
import requests
from bs4 import BeautifulSoup
class Content:
def __init__(self,url,title,body):
self.url = url
self.title = title
self.body = body
def getPage(url):
req = requests.get(url)
return BeautifulSoup(req.text, 'html.parser')
# Scaping news articles from Coindesk
def scrapeCoindesk(url):
bs = getPage(url)
title = bs.find("h3").text
body = bs.find("p",{'class':'desc'}).text
return Content(url,title,body)
# Pulling the article from coindesk
url = 'https://www.coindesk.com/category/markets-news/markets-markets-news/markets-bitcoin/'
content = scrapeCoindesk(url)
print ('Title:{}'.format(content.title))
print ('URl: {}\n'.format(content.url))
print (content.body)
最佳答案
您可以利用每篇文章都包含在 div.article
中这一事实来迭代它们:
def scrapeCoindesk(url):
bs = getPage(url)
articles = []
for article in bs.find_all("div", {"class": "article"}):
title = article.find("h3").text
body = article.find("p", {"class": "desc"}).text
article_url = article.find("a", {"class": "fade"})["href"]
articles.append(Content(article_url, title, body))
return articles
# Pulling the article from coindesk
url = 'https://www.coindesk.com/category/markets-news/markets-markets-news/markets-bitcoin/'
content = scrapeCoindesk(url)
for article in content:
print(article.url)
print(article.title)
print(article.body)
print("-------------")
关于python - 使用 python 请求和漂亮的汤进行网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50254397/