python - 带有亚马逊图书 ISBN 的间歇性 BeautifulSoup

我正在尝试收集有关亚马逊上某些书籍的一些信息，但我遇到了一个我无法理解的奇怪故障错误。起初我以为是亚马逊阻止了我的连接，但后来我注意到请求有一个“200 OK”并且它有相应页面的真实 HTML 内容。

让我们以这本书为例:https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

url = 'https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110/ref=sr_1_1?crid=2PPCQEJD706VY&dchild=1&keywords=books+bestsellers+2020+paperback&qid=1598132071&sprefix=book%2Caps%2C234&sr=8-1'

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, features="lxml")

price = {}

if soup.select("#buyBoxInner > ul > li > span > .a-text-strike") != []:
    price["regular_price"] = float(
        soup.select("#buyBoxInner > ul > li > span > .a-text-strike")[0].string[1:].replace(",", "."))
    price["promo_price"] = float(soup.select(".offer-price")[0].string[1:].replace(",", "."))
else:
    price["regular_price"] = float(soup.select(".offer-price")[0].string[1:].replace(",", "."))
price["currency"] = soup.select(".offer-price")[0].string[0]

这部分工作正常，我可以获得正常价格和促销价格(如果存在)，甚至货币。但是当我这样做时:

isbn = soup.select("td.bucket > .content > ul > li")[4].contents[1].string.strip().replace("-", "")

我收到“IndexError:列表索引超出范围”。但是如果我调试代码，内容实际上就在那里!

这是 BeautifulSoup 的错误吗？请求响应是否太长？

最佳答案

亚马逊似乎返回了两个版本的页面。一个在哪里<td class="bucket">还有一个有几个<span>标签。此脚本尝试从它们中提取 ISBN:

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

url = 'https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, features="lxml")

isbn_10 = soup.select_one('span.a-text-bold:contains("ISBN-10"), b:contains("ISBN-10")').find_parent().text
isbn_13 = soup.select_one('span.a-text-bold:contains("ISBN-13"), b:contains("ISBN-13")').find_parent().text

print(isbn_10.split(':')[-1].strip())
print(isbn_13.split(':')[-1].strip())

打印:

0241985110
978-0241985113

关于python - 带有亚马逊图书 ISBN 的间歇性 BeautifulSoup，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63541601/

python - 带有亚马逊图书 ISBN 的间歇性 BeautifulSoup

上一篇：AnsibleError : template error while templating string: expected token 'end of print statement' , 得到 '{'

下一篇：python - Numpy python 数组切片