python - 如何格式化爬虫输出

我正在尝试从一个网站推断价格，以便创建一个我在下面编写的程序的爬虫。为了获取所有 html 代码，我使用了 BeautifulSoup 和默认的 html.parser。然后我尝试使用名为 generice 等于 soup.findAll("span") 的变量来清理信息。然后我需要进一步清理(列表(我想)它已经创建)以便获得价格，但我陷入了困境。有什么建议么？我不知道如何思考才能解决问题

import smtplib

import time

from bs4 import BeautifulSoup as bs

import requests

URL = "https://www.allkeyshop.com/blog/buy-battlefield-5-cd-key-compare-prices/"

headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"}

def Check_page1():

    page = requests.get(URL, headers=headers)

    soup = bs(page.content, 'html.parser')

    generale = soup.findAll('span')

    price = ?

    print(price)

    print(generale)

print(Check_page1())

最佳答案

当您查看页面的源代码时，您可以看到您正在寻找 <span>类名 price ，可以这样解析:

import time

import requests
from bs4 import BeautifulSoup as bs

URL = "https://www.allkeyshop.com/blog/buy-battlefield-5-cd-key-compare-prices/"
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"}

def CheckPage1():
    page = requests.get(URL, headers=headers)
    soup = bs(page.content, 'html.parser')

    # all spans with prices
    span_prices = soup.findAll("span", {"class": "price"})

    # to get all prices you need to extract text or content attribute
    for span in span_prices:
        price = span.text
        # remove whitespace and print price
        print(price.strip())

        # to get prices without money sign uncomment one of those lines
        # print(price.strip()[:-1])
        # print(price.strip().strip('€'))

CheckPage1()

关于python - 如何格式化爬虫输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57349796/

python - 如何格式化爬虫输出

上一篇：python - 迭代 'spiral' 中的列表

下一篇：python - 如何为散点图上的每个点添加标签？ Matplotlib