我试图从页面中的所有(指定)超链接获取所有 html 源。 页面是https://dota2.gamepedia.com/Category:Counters ,我尝试检索的后续页面源是 https://dota2.gamepedia.com/Abaddon/Counters , https://dota2.gamepedia.com/Alchemist/Counters ……等等。
我尝试了以下代码,但没有结果
from bs4 import BeautifulSoup
import requests
source = requests.get('https://dota2.gamepedia.com/Category:Counters').text
soup = BeautifulSoup(source, 'lxml')
links = soup.find_all('div', class_="mw-category-group")
for c in links:
b = c.find_all('a')
for a in b:
u = a.get('href')
url = "https://dota2.gamepedia.com" + u
# print("https://dota2.gamepedia.com" + u)
for sources in url:
sources = requests.get(url).text
soup = BeautifulSoup(sources, "lxml")
print(sources)
#
# print(url)
最佳答案
使用 CSS 选择器,既简单又快捷。我提供了一些打印内容以确保我们的方式正确。
from bs4 import BeautifulSoup
import requests
source = requests.get('https://dota2.gamepedia.com/Category:Counters').text
soup = BeautifulSoup(source, 'lxml')
for link in soup.select(".mw-category-group a"):
url = "https://dota2.gamepedia.com" +link['href']
print(url)
sources = requests.get(url).text
soup = BeautifulSoup(sources, "lxml")
print("Page Header of Subsequest page")
print(soup.select_one("#firstHeading").text)
输出: 根据 print 语句,您在控制台上的输出将如下所示。
https://dota2.gamepedia.com/Abaddon/Counters
Page Header of Subsequest page
Abaddon/Counters
https://dota2.gamepedia.com/Alchemist/Counters
Page Header of Subsequest page
Alchemist/Counters
https://dota2.gamepedia.com/Ancient_Apparition/Counters
Page Header of Subsequest page
Ancient Apparition/Counters
https://dota2.gamepedia.com/Anti-Mage/Counters
Page Header of Subsequest page
Anti-Mage/Counters
https://dota2.gamepedia.com/Arc_Warden/Counters
Page Header of Subsequest page
Arc Warden/Counters
https://dota2.gamepedia.com/Axe/Counters
Page Header of Subsequest page
Axe/Counters
https://dota2.gamepedia.com/Bane/Counters
Page Header of Subsequest page
Bane/Counters
https://dota2.gamepedia.com/Batrider/Counters
Page Header of Subsequest page
Batrider/Counters
https://dota2.gamepedia.com/Beastmaster/Counters
Page Header of Subsequest page
Beastmaster/Counters
https://dota2.gamepedia.com/Bloodseeker/Counters
Page Header of Subsequest page
Bloodseeker/Counters
https://dota2.gamepedia.com/Bounty_Hunter/Counters
Page Header of Subsequest page
Bounty Hunter/Counters
https://dota2.gamepedia.com/Brewmaster/Counters
Page Header of Subsequest page
Brewmaster/Counters
https://dota2.gamepedia.com/Bristleback/Counters
Page Header of Subsequest page
Bristleback/Counters
https://dota2.gamepedia.com/Broodmother/Counters
Page Header of Subsequest page
Broodmother/Counters
https://dota2.gamepedia.com/Centaur_Warrunner/Counters
Page Header of Subsequest page
Centaur Warrunner/Counters
https://dota2.gamepedia.com/Chaos_Knight/Counters
Page Header of Subsequest page
Chaos Knight/Counters
https://dota2.gamepedia.com/Chen/Counters
Page Header of Subsequest page
Chen/Counters
https://dota2.gamepedia.com/Clinkz/Counters
Page Header of Subsequest page
Clinkz/Counters
https://dota2.gamepedia.com/Clockwerk/Counters
Page Header of Subsequest page
Clockwerk/Counters
https://dota2.gamepedia.com/Crystal_Maiden/Counters
Page Header of Subsequest page
Crystal Maiden/Counters
https://dota2.gamepedia.com/Dark_Seer/Counters
Page Header of Subsequest page
等等...
关于python - 解析页面中超链接的所有 html 源,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58917614/