python - urllib2/requests 不显示网页的 iframe

标签 python iframe beautifulsoup urllib2 python-requests

我正在尝试从 www.amazon.in 中删除一些图书数据

http://www.amazon.in/Life-What-Make-Preeti-Shenoy/dp/9380349300/ref=sr_1_6?s=books&ie=UTF8&qid=1424652069&sr=1-6

我需要位于 iframe 中的那本书的摘要,但问题是当我尝试使用“请求”打开该网址时,它不包含 iframe。

例如,当我这样做时

bookPage = requests.get(bookURL).text
bookSoup = BeautifulSoup(bookPage, "lxml")

bookPage 中没有 iframe,但实际页面包含它。

我也用 urllib2 尝试过,但似乎不起作用。

出了什么问题?

最佳答案

您可以通过 id="bookDescription_feature_div"div 元素中的 noscript 标签获取图书摘要:

>>> from bs4 import BeautifulSoup
>>> import requests
>>> 
>>> response = requests.get('http://www.amazon.in/Life-What-Make-Preeti-Shenoy/dp/9380349300/ref=sr_1_6?s=books&ie=UTF8&qid=1424652069&sr=1-6',
...                         headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'})
>>> 
>>> soup = BeautifulSoup(response.content)
>>> print soup.select('div#bookDescription_feature_div noscript')[0].get_text(strip=True)
Ankita Sharma has the world in her palms. She is young, smart and heads turn at every corner she walks by. Born into a conservative middle class household - this defines the chronicle of her life. Set in a time when Doordarshan was the prime source of entertainment and writing love letters was the general fad, every youngster dreams of the thrills of college life. And so, her admission into an MBA institute in Mumbai follows. Ankita's story begins here, from her life as a college student. Life seems all sunshine and flowers until a drastic turn leaves her staring at a disturbing path, only because of her own misdoing. Jump to six months later. The sun glistens on a sombre building. Magnetized in view, the words - “Mental Institute”. Who is the face staring out of the window?What if destiny twisted your journey? What if it dragged you to a place that houses your worst fears? Would you stand and fight or would you run? Set in the late eighties, across two cities, Life is What You Make It is a compelling account of growing up, determination, faith and how an unconquerable spirit can overcome the punches destiny throws at you. At its core, it is a love story that makes us question our identity and the concept of sanity.

关于python - urllib2/requests 不显示网页的 iframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28665790/

相关文章:

python - 如何使用 python 以更通用的方式读取 json 文件?

python - 抓取 Google 搜索结果时出现空列表

python - Python 中的关键字参数

html - "Full screen"<iframe>

css - iframe url 到 css div url

javascript - Postmessage 不适用于动态 Iframe

python - 从 smg 文件 Beautiful Soup 和 Python 中提取正文标签

python - Numpy 中的向量化赋值

python - AWS SQS ReceiveMessage 收到的消息少于请求的消息?

python - 尝试将矩阵旋转 90 度但无法正常工作