我想从在线购物网站抓取产品和价格,需要帮助提取标签之间写入的字符串
import bs4
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url='https://www.flipkart.com/cameras/mirrorless~type/pr?sid=jek%2Cp31'
cl=urlopen(my_url)
page_html=cl.read()
ps=soup(page_html,'html5lib')
ps1=(ps.prettify())
cn=ps.findAll('div',{'class':'_1-2Iqu row'})
len(cn)
cn[0].div.div
#output-"<div class="_3wU53n">Canon M50 Mirrorless Camera Body with Single Lens EF-M 15-45 mm ISSTM</div>
#i need Canon M50 Mirrorless Camera Body with Single Lens EF-M 15-45 mm ISSTM
最佳答案
将 cn=ps.findAll('div',{'class':'_1-2Iqu row'}) 替换为 cn=ps.findAll('div',{'class':'_1-2Iqu row'} ,文本=真)
关于python网页抓取,提取标签的内部元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59583717/