我想在 Python 中使用 BeautifulSoup 提取数据。
我的文档:
<div class="listing-item" data-id="309531" data-score="0">
<div class="thumb">
<a href="https://res.cloudinary.com/">
<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
</a>
</div>
</div>
这里我想获取背景图片URL
<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
我的代码:
from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests
url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))
for page in range(0, 40): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')
for title, price, date, thumb in zip(soup.select('.listing-item .title'),
soup.select('.listing-item .price'),
soup.select('.listing-item .date'),
soup.select('.listing-item .thumb')):
print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), thumb.get_text().strip()))
如何从文档中获取背景图片 URL?
最佳答案
您可以通过在您的 thumb
值中搜索来访问该网址。
你可以试试这个:
代码:
from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests
url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
print('{:^50} {:^15} {:^25} '.format('Title', 'Price', 'Date'))
for page in range(0, 1): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')
for title, price, date, thumb in zip(soup.select('.listing-item .title'),soup.select('.listing-item .price'),soup.select('.listing-item .date'),soup.select('.listing-item .thumb')):
# url = thumb.find('div').get('style').split('url(')[1].split(');')[0])
print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(),50),price.get_text().strip(), thumb.find('div').get('style').split('url(')[1].split(');')[0]))
关于python - BeautifulSoup 在Python中提取没有类的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59371533/