我需要从几个片段中提取文本(在给定的情况下为“325”和“550”)。我该如何使用 python 3.6.0、bs4、urllib 来解决它。我会将获得的数据添加到 csv 文件中。
<div class="a-row a-spacing-none">
<a class="a-link-normal a-text-normal" href="https://www.amazon.in/Game-Thrones-Song-Ice-Fire/dp/0007428545">
<span class="a-size-small a-color-secondary">
</span>
<span class="a-size-base a-color-price s-price a-text-bold">
<span class="currencyINR">
</span>
325
</span>
</a>
<span class="a-letter-space">
</span>
<span aria-label='Suggested Retail Price: <span class="currencyINR">&nbsp;&nbsp;</span>550' class="a-size-small a-color-secondary a-text-strike">
<span class="currencyINR">
</span>
550
</span>
</div>
我试过使用以下代码,但无法删除伴随它的 span 标签:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire'
# opening up connection, grabbing thr page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
# grabs each product
containers = page_soup.findAll("div", {"class":"s-item-container"})
contain = containers[0]
price = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"})
current_price = price[0].text.strip()
最佳答案
对于初学者,您可以选择所有具有 currencyINR
类的 span
元素。
currency = contain.find('span', attrs={"class":"currencyINR"})
price = currency.nextSibling.strip()
关于python - BeasutifulSoup4 中的导航,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46053129/