我正在尝试从一个网站推断价格,以便创建一个我在下面编写的程序的爬虫。为了获取所有 html 代码,我使用了 BeautifulSoup 和默认的 html.parser。然后我尝试使用名为 generice 等于 soup.findAll("span") 的变量来清理信息。然后我需要进一步清理(列表(我想)它已经创建)以便获得价格,但我陷入了困境。有什么建议么?我不知道如何思考才能解决问题
import smtplib
import time
from bs4 import BeautifulSoup as bs
import requests
URL = "https://www.allkeyshop.com/blog/buy-battlefield-5-cd-key-compare-prices/"
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"}
def Check_page1():
page = requests.get(URL, headers=headers)
soup = bs(page.content, 'html.parser')
generale = soup.findAll('span')
price = ?
print(price)
print(generale)
print(Check_page1())
最佳答案
当您查看页面的源代码时,您可以看到您正在寻找 <span>
类名 price
,可以这样解析:
import time
import requests
from bs4 import BeautifulSoup as bs
URL = "https://www.allkeyshop.com/blog/buy-battlefield-5-cd-key-compare-prices/"
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"}
def CheckPage1():
page = requests.get(URL, headers=headers)
soup = bs(page.content, 'html.parser')
# all spans with prices
span_prices = soup.findAll("span", {"class": "price"})
# to get all prices you need to extract text or content attribute
for span in span_prices:
price = span.text
# remove whitespace and print price
print(price.strip())
# to get prices without money sign uncomment one of those lines
# print(price.strip()[:-1])
# print(price.strip().strip('€'))
CheckPage1()
关于python - 如何格式化爬虫输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57349796/