python - BeautifulSoup 没有提取所有 html

标签 python beautifulsoup mechanize urllib

我们正在尝试从 Forever 21 网站 ( http://www.forever21.com/Product/Category.aspx?br=f21&category=dress&pagesize=100&page=1) 的此页面获取产品 URL。出于某种原因,BeautifulSoup 没有获取类为“item_pic”的元素,即使它们在站点 html 中。我们尝试过使用请求、 Mechanize 、 Selenium ,但没有成功。所有注释代码均来自之前获取 html 的尝试(均无效)。这是我们的代码:

from bs4 import BeautifulSoup
import urllib
import urllib2
import requests

#driver = webdriver.Firefox()
url = "http://www.forever21.com/Product/Category.aspx?br=f21&category=dress&pagesize=100&page=1"
#r = driver.get(url)
#html = r.read()
#headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#html = requests.get(url, headers=headers)
#response = opener.open(url)
#html = response.read()
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
print soup

知道这里出了什么问题吗?

最佳答案

大部分内容是动态添加的,你只需要模仿获取内容的ajax请求即可:

params = {"action": "getcategory",
        "br": "f21",
        "category": "dress",
        "pageno": "",
        "pagesize": "",
        "sort": "",
        "fsize": "",
        "fcolor": "",
        "fprice": "",
        "fattr": ""}

url = "http://www.forever21.com/Ajax/Ajax_Category.aspx"

js = requests.get(url,params=params).json()
print(js)

这为您提供了几乎所有的动态内容,其中的一个片段如下所示:

{u'CategoryHTML': u'<div class="product_item gtm_prod" data-name="Twelve Lace V-Neck Mini Dress" data-sku="2000229555" data-brand="F21" data-product-list="category dress pagesize 120" data-price="58.00" data-retail="58.00">\r\n<div class="item_pic">\r\n<div class="m_qv" alt="quick view" onclick="fnShowProductPopup(\'f21\',\'dress\',\'2000229555\',\'\');" ><span class="quick_view">quick view</span></div>\r\n<a href="http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229555&VariantID=">\r\n<div id="imgDiv_20

所以你想要的是在js[u'CategoryHTML']下:

In [3]: import requests
   ...: from bs4 import BeautifulSoup
   ...: params = {"action": "getcategory",
   ...:         "br": "f21",
   ...:         "category": "dress",
   ...:         "pageno": "",
   ...:         "pagesize": "",
   ...:         "sort": "",
   ...:         "fsize": "",
   ...:         "fcolor": "",
   ...:         "fprice": "",
   ...:         "fattr": ""}
   ...: url = "http://www.forever21.com/Ajax/Ajax_Category.aspx"
   ...: js = requests.get(url, params=params).json()
   ...: soup = BeautifulSoup(js[u'CategoryHTML'], "html.parser")
   ...: [a["href"] for a in soup.select("div.item_pic a")]
   ...: 

Out[3]: 
[u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229555&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000235044&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000225681&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000250594&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000231693&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194240&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192742&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000191102&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000214728&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195373&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213366&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000190888&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000231562&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195713&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000207425&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213751&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229255&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229243&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229254&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215480&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000250589&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000208752&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195206&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193780&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000199117&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192754&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192732&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000199660&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000207415&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000207430&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193799&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194207&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229598&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193794&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000233798&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193784&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193758&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194949&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215792&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194308&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194232&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192739&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193801&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194208&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000237450&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229676&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195483&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215685&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000231583&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213912&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000191263&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000234792&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195271&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000197171&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000250281&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000208855&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215076&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000216738&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194194&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194302&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194303&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213216&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213495&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000233096&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192273&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000212922&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000217399&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000209239&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000250603&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195754&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000197042&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194183&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194281&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000217421&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000233947&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194295&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000230752&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215044&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000191569&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000191576&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215150&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000250593&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000188763&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000215566&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000234952&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000214224&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000220848&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000214184&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213990&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000232029&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000212710&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000230949&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000231443&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192879&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192588&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000235216&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000192281&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000212697&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000213386&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000208787&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193657&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000208320&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000231811&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000196529&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000208541&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229980&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000195375&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229866&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000234442&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000194607&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000191105&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000196404&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000199193&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000216479&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000198558&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000193739&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000231532&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229938&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000229912&VariantID=',
 u'http://www.forever21.com/Product/Product.aspx?BR=f21&Category=dress&ProductID=2000191678&VariantID=']

In [4]: 

您可以改变参数来影响您得到的结果。

关于python - BeautifulSoup 没有提取所有 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40228558/

相关文章:

Python:使用 NaN 对数组进行排序

python - 如何使用给定坐标在图像中绘制一个点

python - 查找 CSV 行的最有效方法是什么,该行不包含该行字段中的重复条目(不包括空白条目)?

python - 将抓取的数据附加到 JSON 文件

python - 如何阻止 BeautifulSoup 转义内联 javascript

ruby - 使用 Mechanize 提取电子邮件地址

python - 用 numPy 数组中的整数替换 bool 值

python - 在 Python 中使用 BeautifulSoup 从 HTML Script 标签中提取 JSON

ruby-on-rails - 如何使用 Ruby 获取网站 (url) cookie 列表

ruby-on-rails - 使用 Mechanize 检查复选框