python - 使用 BeautifulSoup 进行网页抓取时出现问题

尝试提取巴西足球锦标赛表格中的球队列表，搜索正确的标签和类别，但没有得到任何返回数据。我尝试阅读BS4网站上的官方文档，但仍然无法解决这个问题。如果有人能帮助我，我将不胜感激。以下是屏幕截图和使用的代码。

通过选择器使用元素检查器:/image/asMng.png

未返回任何数据的搜索:/image/U2S5W.png

from bs4 import BeautifulSoup
import lxml, requests

r = requests.get('https://www.google.com/search?q=Tabela+do+Campeonato+Brasileiro+de+Futebol&oq=Tabela+do+Campeonato+Brasileiro+de+Futebol&aqs=chrome..69i57.241j0j1&sourceid=chrome&ie=UTF-8#sie=lg;/g/11fmzksb3y;2;/m/0fnk7q;st;fp;1;;')

page = r.text
soup = BeautifulSoup(page, 'lxml')

for i in soup.find_all('span', class_='ellipsisize hsKSJe'):
    print(i.text)

最佳答案

我相信这个问题是因为您正在尝试使用 beautiful soup 从动态页面获取数据。为此，您可以使用 selenium 和 chrome driver 。我将其保存在系统驱动器上自己的文件夹 (bin\chromedriver.exe) 中。

例如，下面为您提供前五行(没有足够的毅力来找出其他所有内容的选择器，抱歉!)

from selenium import webdriver
import pandas as pd

URL = 'https://www.google.com/search?q=Tabela+do+Campeonato+Brasileiro+de+Futebol&oq=Tabela+do+Campeonato+Brasileiro+de+Futebol&aqs=chrome..69i57.241j0j1&sourceid=chrome&ie=UTF-8#sie=lg;/g/11fmzksb3y;2;/m/0fnk7q;st;fp;1;;'
#webdriver and get data from dynamic page
dr = webdriver.Chrome(executable_path=r'C:/bin/chromedriver.exe')
dr.get(URL)
#get table data by xpath
data = dr.find_element_by_css_selector('#rso').get_attribute('outerHTML')
dr.close()

#get data as dataframe
raw = pd.read_html(data)[0]
#organize retrieved columns
labels = raw.columns.values[1:11]
table = raw[labels]
#delete excess column
del table['Club']
table.columns = labels[:-1] #ignore the last value

#view table (can't post an image yet! new here :))

关于python - 使用 BeautifulSoup 进行网页抓取时出现问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63963164/

python - 使用 BeautifulSoup 进行网页抓取时出现问题

上一篇：algorithm - 竞技编程算法 socks 抽签概率题

下一篇：c# - AspNetCore 3.1 路由 AmbigouslyMatchException