我正在尝试从该网站抓取裙子信息:https://www.libertylondon.com/uk/department/women/clothing/dresses/
显然,我不仅对前 60 个结果感兴趣,而且对所有结果感兴趣。单击“显示更多”按钮几次后,我到达此网址:https://www.libertylondon.com/uk/department/women/clothing/dresses/#sz=60&start=300
我本以为使用以下代码可以完整下载上述页面,但由于某种原因,它仍然只能产生前 60 个结果。
import requests
import bs4
url = "https://www.libertylondon.com/uk/department/women/clothing/dresses/#sz=60&start=300"
res = requests.get(url)
res.encoding = 'utf-8'
res.raise_for_status()
html = res.text
soup = bs4.BeautifulSoup(html, "lxml")
elements = soup.find_all("div", attrs = {"class": "product product-tile"})
我可以看到问题在于请求本身,因为 soup
变量不包含我在检查页面时看到的完整 html 文本,但我无法弄清楚这是为什么。
最佳答案
尝试下面的网址,它会获取 331 个元素。
url : https://www.libertylondon.com/uk/department/women/clothing/dresses/?sz=331&start=0&format=ajax
import requests
import bs4
url="https://www.libertylondon.com/uk/department/women/clothing/dresses/?sz=331&start=0&format=ajax"
res = requests.get(url)
res.encoding = 'utf-8'
res.raise_for_status()
html = res.text
soup = bs4.BeautifulSoup(html, "lxml")
elements = soup.find_all("div", attrs = {"class": "product product-tile"})
print(len(elements))
关于python - 从带有 'Show More' 按钮的网站上抓取信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57953996/