python - 使用 python、requests 和 bs4 进行亚马逊价格网络抓取

标签 python beautifulsoup python-requests

我有一个关于网络抓取亚马逊文章价格的问题。我试图获取一篇文章的价格，但不幸的是并不总是有效。我随机收到状态代码 503(服务器不可用)。我可以用一个 while 循环来解决这个问题，如果状态码 == 200 则结束。我想了解服务器不可用的主要问题，这样我也许可以解决主要问题而不是解决它。到目前为止，该问题仅出现在亚马逊上。

这是我的 10 次测试代码。请求通常失败2/10次

import requests
from bs4 import BeautifulSoup


for i in range(10):
    page = requests.get("https://www.amazon.de/Bloodborne-Game-Year-PlayStation-4/dp/B016ZU4FIQ/ref=sr_1_3?ie=UTF8&qid=1519566642&sr=8-3&keywords=bloodborne+ps4")

    if page.status_code != 200:
        print("Error status code: " + str(page.status_code))
        continue

    soup = BeautifulSoup(page.content, "html.parser")

    price = soup.find(id="priceblock_ourprice", class_="a-size-medium a-color-price")


    price_string = price.get_text()

    print(price_string)

最佳答案

尝试下面的脚本。它应该可以让你知道价格。

import requests
from bs4 import BeautifulSoup

URL = "https://www.amazon.de/Bloodborne-Game-Year-PlayStation-4/dp/B016ZU4FIQ/ref=sr_1_3?ie=UTF8&qid=1519566642&sr=8-3&keywords=bloodborne+ps4"
page = requests.get(URL,headers={"User-Agent":"Defined"})
soup = BeautifulSoup(page.content, "html.parser")
price = soup.find(id="priceblock_ourprice").get_text()
print(price)

输出:

EUR 34,99

关于python - 使用 python、requests 和 bs4 进行亚马逊价格网络抓取，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48992650/

上一篇：Python 正则表达式。删除 ':' 之后的所有字符(包括行尾和特定字符串除外)

下一篇：python - Google Cloud Machine Learning 如何处理大量 HDF5 文件？

python - greqests.map 中的 url 太多导致 gevent.hub.LoopExit 仅在 mac 上

python - 如何绕过机器人检测并使用 python 抓取网站

Python SSL 连接 "EOF occurred in violation of protocol"

javascript - 从文本中删除内联样式

python - 无法用Python编写网络爬虫

python - redis + gevent - 性能不佳 - 我做错了什么？

python - 如何迭代表中的 HTML 链接以从表中提取数据？

python - 分层索引的求和列？

python - 无法使 python 的正则表达式忽略 unicode 字符串中的大小写