当我检查浏览器上的元素时,我可以清楚地看到确切的网页内容。但是当我尝试运行下面的脚本时,我看不到网页的一些细节。在网页中,我看到有“#document”元素,但在我运行脚本时缺少这些元素。如何查看#document 元素的详细信息或使用脚本提取。?
from bs4 import BeautifulSoup
import requests
response = requests.get('http://123.123.123.123/')
soup = BeautifulSoup(response.content, 'html.parser')
print soup.prettify()
最佳答案
您还需要发出其他请求 以获取frame
页面内容:
from urlparse import urljoin
from bs4 import BeautifulSoup
import requests
BASE_URL = 'http://123.123.123.123/'
with requests.Session() as session:
response = session.get(BASE_URL)
soup = BeautifulSoup(response.content, 'html.parser')
for frame in soup.select("frameset frame"):
frame_url = urljoin(BASE_URL, frame["src"])
response = session.get(frame_url)
frame_soup = BeautifulSoup(response.content, 'html.parser')
print(frame_soup.prettify())
关于python - 无法通过 python 网络抓取从 HTML 文件中提取#document,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42952404/