Python 网页抓取 : Beautiful Soup

我在抓取网页时遇到问题。我试图获得两支球队之间的积分差异(例如:+2，+1，......)但是当我应用 find_all 方法时它返回一个空列表......

from bs4 import BeautifulSoup
from requests import get
url='https://www.mismarcadores.com/partido/Q942gje8/#punto-a-punto;1'
response=get(url)
html_soup=BeautifulSoup(response.text,'html.parser')


html_soup.find_all('span',class_='match-history-diff-score-inc')

最佳答案

您遇到的问题是 Web 内容是通过 JavaScript 动态生成的。因此， requests 无法处理它，所以你最好使用类似 Selenium 的东西.

编辑:根据@λuser 的建议，我修改了我的答案，通过使用 XPath 搜索您要查找的元素来仅使用 Selenium。 请注意，我使用了 XPath 函数 starts-with() 来获取 match-history-diff-score-dec 和 match -history-diff-score-inc。仅选择其中一个会让您错过几乎一半的相对分数更新。这就是输出产生 103 个结果而不是 56 个的原因。

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.mismarcadores.com/partido/Q942gje8/#punto-a-punto;1")

table = driver.find_elements_by_xpath('//td//span[starts-with(@class, "match-history-diff-score-")]')

results = []
for tag in table:
    print(tag.get_attribute('innerHTML'))
print(results)

这个输出:

['+2', '+1', '+2', '+2', '+1', '+2', '+4', '+2', '+2', '+4', '+7', '+5', '+8', '+5', '+7', '+5', '+3', '+2', '+5', '+3', '+5', '+3', '+5', '+6', '+4', '+6', '+7', '+6', '+5', '+2', '+4', '+2', '+5', '+7', '+6', '+8', '+5', '+3', '+1', '+2', '+1', '+4', '+7', '+5', '+8', '+6', '+9', '+11', '+10', '+9', '+11', '+9', '+10', '+11', '+9', '+7', '+5', '+3', '+2', '+1', '+3', '+1', '+3', '+2', '+1', '+3', '+2', '+4', '+1', '+2', '+3', '+6', '+3', '+5', '+2', '+1', '+1', '+2', '+4', '+3', '+2', '+4', '+1', '+3', '+5', '+7', '+5', '+8', '+7', '+6', '+5', '+4', '+1', '+4', '+6', '+9', '+7', '+9', '+7', '+10', '+11', '+12', '+10']

关于Python 网页抓取 : Beautiful Soup，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50319091/

Python 网页抓取 : Beautiful Soup

上一篇：python - 将 long for 语句缩短到最多 79 列的正确方法

下一篇：python - 如何通过其内部文本(Python)找到带有 Selenium 的按钮？