python - 从带有 'Show More' 按钮的网站上抓取信息

标签 python python-3.x web-scraping beautifulsoup python-requests

我正在尝试从该网站抓取裙子信息:https://www.libertylondon.com/uk/department/women/clothing/dresses/

显然，我不仅对前 60 个结果感兴趣，而且对所有结果感兴趣。单击“显示更多”按钮几次后，我到达此网址:https://www.libertylondon.com/uk/department/women/clothing/dresses/#sz=60&start=300

我本以为使用以下代码可以完整下载上述页面，但由于某种原因，它仍然只能产生前 60 个结果。

import requests
import bs4

url = "https://www.libertylondon.com/uk/department/women/clothing/dresses/#sz=60&start=300"

res = requests.get(url)
res.encoding = 'utf-8'
res.raise_for_status()
html = res.text

soup = bs4.BeautifulSoup(html, "lxml")
elements = soup.find_all("div", attrs = {"class": "product product-tile"})

我可以看到问题在于请求本身，因为 soup 变量不包含我在检查页面时看到的完整 html 文本，但我无法弄清楚这是为什么。

最佳答案

尝试下面的网址，它会获取 331 个元素。

url : https://www.libertylondon.com/uk/department/women/clothing/dresses/?sz=331&start=0&format=ajax

import requests
import bs4

url="https://www.libertylondon.com/uk/department/women/clothing/dresses/?sz=331&start=0&format=ajax"
res = requests.get(url)
res.encoding = 'utf-8'
res.raise_for_status()
html = res.text

soup = bs4.BeautifulSoup(html, "lxml")
elements = soup.find_all("div", attrs = {"class": "product product-tile"})
print(len(elements))

关于python - 从带有 'Show More' 按钮的网站上抓取信息，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57953996/

上一篇：Python 2.7.16 - 正则表达式lookbehind不适用于Findall

下一篇：python - 如何使用python数字签名PDF？

相关文章：

python倒数谜题。需要效率方面的帮助

python - 一般来说，在 python 代码中使用语句 "from module import *"是一个不好的做法吗？

python - 如何根据给定的csv文件将像素值转换为另一个值？

dom - 使用 phantomjs 或其他东西挖掘/爬网/网络控制台？

python - BeautifulSoup 导入错误

python - 迭代一个类型而不实例化它

python - 归一化返回到 100

python - vtk 中是否有可用的颜色映射预设？

python-3.x - “NoneType”对象没有属性 'pydev_do_not_trace'

internet-explorer - 循环浏览网页并复制数据