python - 网络抓取从网页中提取产品名称

我正在尝试获取此网站上包的名称- http://www.barneys.com/barneys-new-york/women/bags . 到目前为止，我有这段代码:

 from urllib.request import urlopen
    from bs4 import BeautifulSoup
    url="http://www.barneys.com/barneys-new-york/women/bags"
    html = urlopen(url)
    bsObj = BeautifulSoup(html.read(),"html.parser")  
    product_name = bsObj.findAll("a",{"class":"name-link"})
    print(product_name)

我尝试了 renderContents() 和 get_text()，但它们给我错误 (AttributeError)。

最佳答案

名称在 product-name div 中:

from bs4 import BeautifulSoup
import  requests

soup = BeautifulSoup(requests.get("http://www.barneys.com/barneys-new-york/women/bags").content)

print([prod.text.strip() for prod in  soup.select("div.product-name")])

这给了你:

['Lizard iPhone® 6 Plus Case', 'Lizard iPhone® 6 Case', 'Peekaboo Large Satchel', 'Embellished Shoulder Bag', 'Rockstud Reversible Tote', 'Hava Shoulder Bag', 'Rockstud Crossbody', 'PS1 Tiny Shoulder Bag', 'Faye Medium Shoulder Bag', 'Rockstud Crossbody', 'Large Shopper Tote', 'Flat Clutch', 'Wicker Small Crossbody', 'Jotty Duffel', 'P.Y.T. Shoulder Bag', 'Hadley Baby Satchel', 'Beckett Small Crossbody', 'Squarit PM Satchel', 'Double Baguette Micro', 'City Victoria Small Satchel', 'Large Zip Pouch', 'Jotty Duffel', 'Jen Small Crossbody', 'Mini Trouble Shoulder Bag', 'Midi Clutch', 'Midi Clutch', 'Two For One Pouch 10', 'Guitar Rockstud Medium Backpack', 'Embellished Large Messenger', 'Papier A4 Side-Zip Tote', 'Nightingale Micro-Satchel', 'Hand-Carved Atlas Clutch', 'Emerald-Cut Minaudière', 'Trouble II Shoulder Bag', 'Intrecciato Olimpia Small Shoulder Bag', 'Rockstud Large Tote', 'Baguette Micro', 'Bindu Small Clutch', 'Emerald-Cut Minaudière', 'Gotham City Hobo', 'Brillant Sellier PM Satchel', 'Flight Weekender Duffel', 'Sac Mesh Bucket Bag', 'Seema Small Satchel', 'Madison Shoulder Bag', 'Sporty Smiley Crossbody', 'Monogram Large Wallet', 'Monogram Card Case']

如果你想要所有的信息，你可以从带有 thumb-link 类的 anchor 标签中获取它，在 div 中带有 id primary:

print(soup.select("#primary a.thumb-link"))

它给你这样的输出:

<a class="thumb-link" href="http://www.barneys.com/vianel-lizard-iphone%C2%AE-6-plus-case-504475332.html" title="Lizard iPhone® 6 Plus Case">
<img alt="Vianel Lizard iPhone® 6 Plus Case" class="gridImg" data-image-alter="http://product-images.barneys.com/is/image/Barneys/504475332_2_detail?$grid_new_fixed$" data-original="http://product-images.barneys.com/is/image/Barneys/504475332_1_tabletop?$grid_new_fixed$" height="370" onerror="this.src='http://demandware.edgesuite.net/aasv_prd/on/demandware.static/Sites-BNY-Site/-/default/dwd89468c5/images/browse_placeholder_image.jpg'" title="Lizard iPhone® 6 Plus Case" width="231"/>
<noscript>
<img alt="Vianel Lizard iPhone® 6 Plus Case" src="http://product-images.barneys.com/is/image/Barneys/504475332_1_tabletop?$grid_new_fixed$" title="Lizard iPhone® 6 Plus Case?$grid_new_fixed$"/>
</noscript>

您可以从每个返回的 a 中解析图像、标题等。

使用您自己的代码，您需要像上面那样访问 .text 属性:

product_name = [a.text.strip() for a in  bsObj.findAll("a",{"class":"name-link"})]
print(product_name)

这会给你和第一个选择一样的结果:

['Lizard iPhone® 6 Plus Case', 'Lizard iPhone® 6 Case', 'Peekaboo Large Satchel', 'Embellished Shoulder Bag', 'Rockstud Reversible Tote', 'Hava Shoulder Bag', 'Rockstud Crossbody', 'PS1 Tiny Shoulder Bag', 'Faye Medium Shoulder Bag', 'Rockstud Crossbody', 'Large Shopper Tote', 'Flat Clutch', 'Wicker Small Crossbody', 'Jotty Duffel', 'P.Y.T. Shoulder Bag', 'Hadley Baby Satchel', 'Beckett Small Crossbody', 'Squarit PM Satchel', 'Double Baguette Micro', 'City Victoria Small Satchel', 'Large Zip Pouch', 'Jotty Duffel', 'Jen Small Crossbody', 'Mini Trouble Shoulder Bag', 'Midi Clutch', 'Midi Clutch', 'Two For One Pouch 10', 'Guitar Rockstud Medium Backpack', 'Embellished Large Messenger', 'Papier A4 Side-Zip Tote', 'Nightingale Micro-Satchel', 'Hand-Carved Atlas Clutch', 'Emerald-Cut Minaudière', 'Trouble II Shoulder Bag', 'Intrecciato Olimpia Small Shoulder Bag', 'Rockstud Large Tote', 'Baguette Micro', 'Bindu Small Clutch', 'Emerald-Cut Minaudière', 'Gotham City Hobo', 'Brillant Sellier PM Satchel', 'Flight Weekender Duffel', 'Sac Mesh Bucket Bag', 'Seema Small Satchel', 'Madison Shoulder Bag', 'Sporty Smiley Crossbody', 'Monogram Large Wallet', 'Monogram Card Case']

关于python - 网络抓取从网页中提取产品名称，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36828121/

python - 网络抓取从网页中提取产品名称

上一篇：python - (Python 3.x) Web 抓取代码需要一个多小时才能加载

下一篇：python - Keras - 训练卷积网络，获得自动编码器输出