python - 如何抓取特定区域的产品价格

标签 python html web-scraping beautifulsoup

作为练习,我尝试从 Lowes 中抓取有关洗衣机的信息。 https://www.lowes.com/pl/Washing-machines-Washers-dryers-Appliances/4294857977

要访问价格,我需要找到类为“product-pricing”的 div,然后在其中获取 span 的文本。但是,当我在浏览器中检查 div 时,它与我使用 beautifulsoup 抓取它时完全不同。当我检查时,它看起来像这样:

<div class="product-pricing">
<div class="pl-price js-pl-price" tabindex="-1">                 

     <!-- Was Price -->
     <div class="v-spacing-mini">
           <span class="h5 js-price met-product-price art-pl-contractPricing0" data-met-type="was">$499.00</span>
     </div>
     <div class="v-spacing-mini">
           <p class="darkMidGrey art-pl-wasPriceLbl0">was: $749.00</p>

              <small class="green small art-pl-saveThruLbl0">SAVE 33% thru 10/30/2018</small><br>
     </div>

  <!-- Start of Product Family Pricing -->

  <!-- Contractor Pack Messaging -->

  <!-- End of Product Family Pricing -->
  </div>
  <div class="v-spacing-small">
     <a role="link" tabindex="-1" data-toggle="popover" aria-haspopup="true" data-trigger="focus" data-placement="bottom auto" data-content="FREE local delivery applies to any major appliance $396 or more, full-size gas grills $498 or more, patio furniture orders $498 or more, and riding and ZTR mowers $999 or more. Applies to standard deliveries in US only. Purchase threshold calculated before taxes, after applicable discounts, if any. Additional fees may apply." data-original-title="Free Delivery" class="js-truck-delivery"><i class="icon-truck" title="FREE Delivery" aria-label="FREE Delivery."></i> <strong>FREE Delivery</strong></a>
  </div>
</div>

但是当我抓取时,我反而看到了:

<div class="product-pricing">
<div class="v-spacing-jumbo clearfix">
  <a aria-haspopup="true" class="js-enter-location" data-content="Since Lowes.com is national in scope, we check inventory at your local store first in an effort to fulfill your order more quickly. You may find product or pricing that differ from that of your local store, but we make every effort to minimize those differences so you can get exactly what you want at the best possible price." data-placement="top auto" data-toggle="popover" data-trigger="focus" role="link" tabindex="-1">
     <p class="h6" id="ada-enter-location"><span>Enter your location</span>
        <i aria-hidden="true" class="icon-info royalBlue"></i>
     </p>
  </a>
  <p class="small-type secondary-text" tabindex="-1">for pricing and availability.</p>
</div>
<form action="#" class="met-zip-container js-store-locator-form" data-modal-open="true" data-zip-in="true" id="store-locator-form">
  <input name="redirectUrl" type="hidden" value="/pl/Washing-machines-Washers-dryers-Appliances/4294857977"/>
  <div class="form-group product-form-group">
     <div class="input-group">
        <input aria-label="Enter your zip code" autocompletetype="find-a-store-search" class="form-control js-list-zip-entry-input met-zip-code" name="searchTerm" placeholder="ZIP Code" role="textbox" tabindex="-1" type="text"/>
        <span class="input-group-btn">
        <button class="btn btn-primary js-list-zip-entry-submit met-zip-submit" data-linkid="get-pricing-and-availability-zip-in-modal-submit" tabindex="-1" type="submit">OK</button>
        </span>
     </div>
     <span class="inline-help">ZIP Code</span>
  </div>
 </form>
</div>

我认为这与网站必须使用我的位置来确定正确价格这一事实有关。似乎有一个隐藏的输入,我的浏览器知道我的位置并告诉网站, BeautifulSoup 有没有办法刮掉它检查我的位置后出现的价格?

这是我使用的代码:

import re
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.lowes.com/pl/Washing-machines-Washers-dryers- 
Appliances/4294857977'

uClient = uReq(my_url)

page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, features = "lxml")

containers = page_soup.findAll("div", {"class":"product-wrapper-right"})
for container in containers:
    price = container.findAll("span", {"class":"js-price"})[0].text

编辑:给我第二个 html 的具体代码是

container.findAll("div", {"class":"product-pricing"})   

最佳答案

不能 100% 确定这会解决您的问题,但使用 selenium 可能会有所帮助,因为它是一个实际的浏览器,并且会发送普通浏览器在访问网站时发送的数据。

Selenium 简介的链接:https://medium.freecodecamp.org/better-web-scraping-in-python-with-selenium-beautiful-soup-and-pandas-d6390592e251

关于python - 如何抓取特定区域的产品价格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53074249/

相关文章:

python - 如何检索其中包含哈希的 GET 变量

python - 如何从 Python re 获取不匹配的正则表达式组?

javascript - 晕影蒙版效果

python - 如何在scrapy中处理大量请求?

python-3.x - 抓取youtube播放列表

python - 设置为开发人员模式时,Flask应用程序无法在cygwin上运行

python - Ubuntu中安装python google api

html - 显示 : table-cell issue in Firefox

javascript - 如何在每个元素目标中使用嵌套触发器克隆具有相同类的 div

go - Chromedp 不在 ActionFunc 的循环内执行操作