python 3 : using requests does not get the full content of a web page

我正在使用 requests 进行测试获取网页内容的模块。但是当我查看内容时，我发现它没有获得页面的全部内容。

这是我的代码:

import requests
from bs4 import BeautifulSoup

url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

同样在 chrome 网络浏览器上，如果我查看页面源代码，我看不到完整内容。

有没有办法获取我提供的示例页面的全部内容？

最佳答案

该页面使用 JavaScript 呈现，发出更多请求以获取更多数据。您可以使用 selenium 获取完整的页面。

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())

有关其他解决方案，请参阅我对 Scraping Google Finance (BeautifulSoup) 的回答

关于 python 3 : using requests does not get the full content of a web page，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47730671/

上一篇：amazon-web-services - AWS Fargate 任务调试， "CannotPullContainerError ... invalid reference format"是什么意思？

下一篇：csv - 使用 awk 按名称获取 CSV 字段

相关文章：

python - 使用 python 抓取谷歌精选片段

python - 如何在 wordnet 层次结构中使用 python nltk 查找两个同义词集之间的距离？

python - Windows 上的 subprocess.call 没有启动第二个文件

python - 为什么BeautifulSoup找不到HTML类？

Python 请求 api 不在表体中获取数据

python - 使用 python 重试失败的 HTTP 请求

python - 如何断言引发了 HTTP 异常？

python - 只导入一个类的静态方法

python - 如何为散点图绘制平均线

python - Scrapy中间件设置