python - 如何使用汤从页面中提取列中的数据

标签 python beautifulsoup request python-requests

try catch 项目符号中存在的数据

链接https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/

这里需要使用xpath提取数据

要提取的数据

    4 Door Sedan

    4 Cylinder, 1.8 Litre

    Constantly Variable Transmission, Front Wheel Drive

    Petrol - Unleaded ULP

    6.4 L/100km 

试过这个:

import requests
import lxml.html as lh
import pandas as pd
import html
from lxml import html
from bs4 import BeautifulSoup
import requests


cars = [] 

urls = ['https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/']

for url in urls: 
    car_data={} 
    headers = {'User-Agent':'Mozilla/5.0'}
    page = (requests.get(url, headers=headers))
    tree = html.fromstring(page.content)
    if tree.xpath('/html/body/div[1]/div[2]/div/div[1]/div[1]/div[4]/div/div'):
        car_data["namings"] = tree.xpath('/html/body/div[1]/div[2]/div/div[1]/div[1]/div[4]/div/div')[0]


最佳答案

你已经导入了 BeautifulSoup 那么为什么不使用 css 类选择器呢?

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
info = [i.text.strip() for i in soup.select('.dgi-')]

enter image description here

你也可以打印为

for i in soup.select('.dgi-'):
    print(i.text.strip())

关于python - 如何使用汤从页面中提取列中的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57720937/

相关文章:

node.js - 使用nodejs抓取网页返回URL

python - Django 测试需要很长时间才能启动

python - 使用矢量化加速计算

Python拆分为换行符

python - 如何使用python从本地网站抓取数据

Android - 取消 Volley 请求

python - 使用Python直接向USB发送信号

python - Robocopy 错误代码 6 '' 句柄无效'

python - 将多个html文件抓取到CSV

java - Feign 接口(interface)是否应该具有带有指定名称的 @PathVariable 和 @RequestParam 注释才能工作?