javascript - python BeautifulSoup 无法从网页获取文本

标签 javascript python html web-scraping beautifulsoup

我正在尝试使用 python 从网页获取产品名称。但它只返回一个空标签。我还尝试了 BeautifulSoup 中的 requests 库和 lxml 解析。请帮助我解决这个问题,提前致谢:-)

网站中的 HTML:

<div class="product-name">SWAN</div>
   <div class="product-price">
   <span class="final-price">₹10650</span>
</div>
<div class="specification">
   <div>Specifications</div>
   <table>
      <tr>
         <td>....</td>
      </tr>
      <tr>
         <td>....</td>
     </tr>
   </table>
</div>

Python 代码:

url = "http://opor.in/ProductDetail/Index?ProductId=212"
page = urlopen(url).read()
html = bs(page, 'html.parser')
model_name = html.find('div', attrs={'class':'product-name'})
spec = html.find('div', attrs={'class':'specification'})
print(model_name)
print(spec)

输出:

<div class="product-name"></div>
<div class="specification">
<div>Specifications</div>
<table></table>
</div>

最佳答案

通过java-scripts加载的数据。但是,如果您在script标签中看到可用的DOM数据。从script标签中获取值并加载到json中,然后获取键值。

代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
import json
url = "http://opor.in/ProductDetail/Index?ProductId=212"
page = urlopen(url).read()
soup = bs(page, 'html.parser')

for item in soup.find_all('script'):
   if 'productDetail' in item.text:
       data=item.text.split('var productDetail =')[-1].split('};')[0] + "}"
       datajson=json.loads(data.strip())
       print('Product Code :' + datajson['ProductCode'])
       for item in datajson['ProductSpecification']:
           print(item['SpecificationName'] + " : "+ item['SpecificationValue'])

输出:

Product Code :1601KFMB
MEMBRANE : MEMBRELLA -ALPHA- 80 GPD (2 NOS)
PUMP : KEMFLO 48 V
APPLICATION : SUITABLE FOR BRACKISH WATER
FILTER LIFE : APPROX 3000 LITRE / 6 MONTHS
FILTERS : SEDIMENT, PRECARBON, POST CARBON
FLOAT : MEMBRELLA
FR : MEMBRELLA /KFL
INLINE SET : MEMBRELLA
INPUT VOLTAGE : 100-300 VOLT AC (50Hz)
INSTALLATION : COUNTER TOP
MAX.OPERATION TDS : 4000 PPM
MEMBRANE TYPE : THIN FILM COMPOSITE
MIN.INLET PRESSURE / TEMP : 0.3 kg / cm2, 10 °C
MODEL : WHALE 25
OPERATING VOLTAGE : 48 VOLT (DC)
PRODUCT DIMENSION : 21.1 (H) x 9.9 (W) x 16.7 (L)
PURIFICATION CAPACITY : 25 LITRES PER HOUR
RECOVERY RATE : MORE THAN 30% AT 27°c ± 2°c
SMPS : MEMBRELLA / EQUALIANT
SOLENOID VALVE : MEMBRELLA / SLX
STORAGE CAPACITY : 20 LITRES
TECHNOLOGY : REVERSE OSMOSIS SYSTEM
TOTAL POWER CONSUMPTION : 50 W
TUBE 1/4 : 5 METERS
TUBE 3/8 : 2 METERS
WEIGHT : 18 kg (Approx)
WARRENTY &  SUPPORT : Since Whale  designs its purifiers and many of its parts  are a truly integrated system. Dealer only  can provide one-stop service ,guaranty and support for any service and maintenance, so most issues can be resolved in a single visit

关于javascript - python BeautifulSoup 无法从网页获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59951068/

相关文章:

javascript - 延期/ promise 混淆和实现

python - 选择文本时 Python 3.3.0 中的 IDLE 崩溃

python - 使用中间件存储的数据异步更新页面

python - 如果不在列表中

html - Bootstrap 网格无法使用 xs 列

java - 使用java servlet的异步文件上传

javascript - 如何否定 "if"语句 block 中的代码 JavaScript -JQuery like 'if not then..'

javascript - ASCII 动画

javascript - JS中单引号替换为双引号

html - 在换行时移除多行 flex 元素之间的空间(间隙)