Python、BS 和 Selenium

标签 python selenium web-scraping beautifulsoup

我尝试使用 javascript 动态 + bs + python 进行网页抓取，并且我阅读了很多内容来编写此代码，例如，我尝试在著名网站上抓取使用 javascript 呈现的价格:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.nespresso.com/fr/fr/order/capsules/original/"

browser = webdriver.PhantomJS(executable_path = "C:/phantomjs-2.1.1-windows/bin/phantomjs.exe")
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'lxml')

soup.find("span", {'class':'ProductListElement__price'}).text

但我只有结果 '\xa0' 这是源值，而不是 javascript 值，我真的不知道我做错了什么......

致以诚挚的问候

最佳答案

您不需要购买浏览器。该信息位于脚本标记中，因此您可以使用正则表达式输出并使用 json 库进行处理

import requests, re, json

r = requests.get('https://www.nespresso.com/fr/fr/order/capsules/original/')
p = re.compile(r'window\.ui\.push\((.*ProductList.*)\)')
data = json.loads(p.findall(r.text)[0])
products = {product['name']:product['price'] for product in data['configuration']['eCommerceData']['products']}
print(products)

正则表达式:

关于Python、BS 和 Selenium，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59383669/

上一篇：python - 如何在Python中正确编码？

下一篇：python - 为什么列表上的 os.path.join() 和列表上的 os.path.sep.join() 在功能上不相同？

javascript - javascript/node.js 中的 JSONP 解析

javascript - Scrapy:通过Javascript获取内容集

python - Django，从模型获取属性

python - 使用 basemap 和 Pandas 创建Choropleth map

java - 同时测试浏览器；接收错误

python - 在 py 文件中存储和访问数据

带参数的 Python 装饰器

python - 如何在Python中检查视频是否有声音？

html - 如何在 Selenium 定位器中定位 "::after"元素？