我正在尝试收集有关亚马逊上某些书籍的一些信息,但我遇到了一个我无法理解的奇怪故障错误。起初我以为是亚马逊阻止了我的连接,但后来我注意到请求有一个“200 OK”并且它有相应页面的真实 HTML 内容。
让我们以这本书为例:https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
url = 'https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110/ref=sr_1_1?crid=2PPCQEJD706VY&dchild=1&keywords=books+bestsellers+2020+paperback&qid=1598132071&sprefix=book%2Caps%2C234&sr=8-1'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, features="lxml")
price = {}
if soup.select("#buyBoxInner > ul > li > span > .a-text-strike") != []:
price["regular_price"] = float(
soup.select("#buyBoxInner > ul > li > span > .a-text-strike")[0].string[1:].replace(",", "."))
price["promo_price"] = float(soup.select(".offer-price")[0].string[1:].replace(",", "."))
else:
price["regular_price"] = float(soup.select(".offer-price")[0].string[1:].replace(",", "."))
price["currency"] = soup.select(".offer-price")[0].string[0]
这部分工作正常,我可以获得正常价格和促销价格(如果存在),甚至货币。但是当我这样做时:
isbn = soup.select("td.bucket > .content > ul > li")[4].contents[1].string.strip().replace("-", "")
我收到“IndexError:列表索引超出范围”。但是如果我调试代码,内容实际上就在那里!
这是 BeautifulSoup 的错误吗?请求响应是否太长?
最佳答案
亚马逊似乎返回了两个版本的页面。一个在哪里<td class="bucket">
还有一个有几个<span>
标签。此脚本尝试从它们中提取 ISBN:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
url = 'https://www.amazon.co.uk/All-Rage-Cara-Hunter/dp/0241985110'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, features="lxml")
isbn_10 = soup.select_one('span.a-text-bold:contains("ISBN-10"), b:contains("ISBN-10")').find_parent().text
isbn_13 = soup.select_one('span.a-text-bold:contains("ISBN-13"), b:contains("ISBN-13")').find_parent().text
print(isbn_10.split(':')[-1].strip())
print(isbn_13.split(':')[-1].strip())
打印:
0241985110
978-0241985113
关于python - 带有亚马逊图书 ISBN 的间歇性 BeautifulSoup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63541601/