python - 如何使用 BeautifulSoup 等待一秒钟保存汤元素以让元素在页面中加载完成

我正在尝试从 THIS WEBSITE 中抓取数据在某些产品中有 3 种价格(静音价格、红色价格和黑色价格)，我观察到当产品有 3 个价格时，红色价格在页面加载之前发生了变化。

当我抓取网站时，我只得到两个价格，我认为如果代码等到页面完全加载，我将获得所有价格。

这是我的代码:

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")

# Muted Price
MutedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-listPriceValue ph2 dib strike custom-list-price fw5 exito-vtex-component-precio-tachado'})[0].text
MutedPrice=pd.to_numeric(MutedPrice[2-len(MutedPrice):].replace('.',''))

# Red Price
RedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-sellingPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-rojo'})[0].text
RedPrice=pd.to_numeric(RedPrice[2-len(RedPrice):].replace('.',''))

# black Price
BlackPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-alliedPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-negro'})[0].text
BlackPrice=pd.to_numeric(BlackPrice[2-len(BlackPrice):].replace('.',''))

print('Muted Price:',MutedPrice)
print('Red Price:',RedPrice)
print('Black Price:',BlackPrice)

实际结果:
静音价格:3199900
红色价格:1649868
黑色价格:0

预期成绩:
静音价格:3199900
红色价格:1550032
黑色价格:1649868

最佳答案

可能是这些值是动态呈现的，即这些值可能由页面中的 javascript 填充。
requests.get()简单地返回从服务器接收到的标记，而没有任何进一步的客户端更改，因此它并不完全是等待。

你也许可以使用 Selenium Chrome Webdriver加载页面 URL 并获取页面源。 (或者您可以使用 Firefox 驱动程序)。

转至 chrome://settings/help检查您当前的 chrome 版本并从 here 下载该版本的驱动程序.确保将驱动程序文件保存在您的 PATH 中。或您的python 脚本所在的同一文件夹。

尝试用以下代码替换现有代码的前 3 行:

from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'

# use Chrome to get page with javascript generated content
with closing(Chrome(executable_path="./chromedriver")) as browser:
     browser.get(url)
     page_source = browser.page_source

soup = BeautifulSoup(page_source, "lxml")

输出:

Muted Price: 3199900
Red Price: 1550032
Black Price: 1649868

引用:

Get page generated with Javascript in Python

selenium - chromedriver executable needs to be in PATH

关于python - 如何使用 BeautifulSoup 等待一秒钟保存汤元素以让元素在页面中加载完成，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58676379/

python - 如何使用 BeautifulSoup 等待一秒钟保存汤元素以让元素在页面中加载完成

上一篇：laravel - 从现有表的增量迁移到 bigIncrements

下一篇：node.js - 将自定义信息添加到 Firebase 函数中的 Stackdriver 错误日志