python - 如何使用 beautifulsoup 获取所有页面？

我想从所有页面获取链接，已经有此代码，但当我运行代码时，它总是显示错误(返回 self.attrs[key])KeyError:'href'。有谁可以帮忙吗，谢谢。这是代码:

from bs4 import BeautifulSoup
import urllib.request
import requests



url = "http://makeupuccino.com/makeup/faces/foundation?page={}"


def get_url(url):
    req = urllib.request.Request(url)
    return urllib.request.urlopen(req)

link = []
nama = []
merek = []
harga = []
gambar = []
deskripsi = []

page = 1
while (requests.get(url.format(page)).status_code==200):
    res = requests.get(url.format(page))
    print(res.url)
    soup = BeautifulSoup(res.content,"html.parser")
    items = soup.findAll("div",{"class":"product-block-inner"})
    if len(items)<=1:break #untuk stop ketika produk tidak ditemukan lagi di page selanjutnya
    for item in items:

        new_link = item.find("div",{"class":"image"})
        print(new_link["href"])


    page+=1

最佳答案

您选择了作为 anchor 标记的父节点的 div 元素，但没有选择包含 href 元素的 anchor 标记。您需要将 .a 添加到循环内的代码中。

类似于，

print(new_link.a["href"])

将为您提供正确的链接。

为了正确分页，我可以建议您两种方法。

求出页数并循环各页。在您的情况下，页码在 page-result 类中给出。您可以通过以下代码找到页码。

page_numbers = soup.find('div', {'class':'page-result'}).text page_numbers = page_numbers.split('(')[-1].replace('页数)', '') Total_pages = ['http://makeupuccino.com/makeup/faces/foundation?page='+str(i) for i in page_numbers] #此列表将为您提供总页数 - 包含您提供的链接的 4 页
在页面中出现 There are no products to list in this Category. 文本时中断 while 循环。使用以下代码部署它，

soup = BeautifulSoup(res.content,"html.parser") 如果“此类别中没有可列出的产品。”在str(汤)中: 休息别的: #其余代码。

虽然第二个解决方案看起来相对简单，但我建议您使用第一个解决方案，因为它会教您很多东西，而且这也是合适的方法。

希望这有帮助!干杯!

关于python - 如何使用 beautifulsoup 获取所有页面？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52416158/

python - 如何使用 beautifulsoup 获取所有页面？

上一篇：python - 替换边缘上的数字

下一篇：python - PyAudio 回调仅被调用一次