我正在尝试打印来自该网站的每篇文章链接,但文章链接打印了两次,只打印了其中的 5 个。
我尝试将范围增加到 (1,20),这会打印所有十篇文章链接,但每个链接都打印两次。
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = urlopen("https://www.politico.com/newsletters/playbook/archive")
target = 'C:/Users/k/Politico/pol.csv'
content = url.read()
soup = BeautifulSoup(content,"lxml")
for article in range (1,10):
#Prints each article's link and saves to csv file
print(soup('article')[article]('a',{'target':'_top'}))
我希望输出的是 10 个文章链接,没有一个是重复的。
最佳答案
您可以使用 css 选择器 .front-list h3 > a
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.politico.com/newsletters/playbook/archive#')
soup = bs(r.content, 'lxml')
links = [link['href'] for link in soup.select('.front-list h3 > a')]
print(links)
关于python - 如何使用 BeautifulSoup 停止文章打印两次,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56087919/