python - 修改url参数以从多个网站下载图片

标签 python web-scraping

我试图从 CaseIDs 数组中包含的所有案例中下载图像,但它不起作用。我希望代码能够在所有情况下运行。

from bs4 import BeautifulSoup
import requests as rq
from urllib.parse import urljoin
from tqdm import tqdm

CaseIDs = [100237, 99817, 100271]

with rq.session() as s:
    for caseid in tqdm(CaseIDs):
        url = 'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID= {caseid}'
        r = s.get(url)
        soup = BeautifulSoup(r.text, "html.parser")

        url = urljoin(url, soup.find('a', text='Text and Images Only')['href'])
        r = s.get(url)
        soup = BeautifulSoup(r.text, "html.parser")

        links = [urljoin(url, i['src']) for i in soup.select('img[src^="GetBinary.aspx"]')]

        count = 0
        for link in links:
            content = s.get(link).content
            with open("test_image" + str(count) + ".jpg", 'wb') as f:
                f.write(content)
            count += 1

最佳答案

尝试像这样使用format():

url = 'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID={}'.format(caseid)

关于python - 修改url参数以从多个网站下载图片,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59819738/

相关文章:

python - 在 psycopg2 中插入查询值时删除引号

python - Google Appengine 数据存储 - python

python - 抓取 spotify 网页界面

python - 使用 BeautifulSoup (python) 提取自定义 "data"标签

python - 将此 sql 转换为 flask-sqlalchemy 语法

python - 如何在 sunOS 启动时运行我的 python 脚本

python - BeautifulSoup find_all() 查找具有多个可接受属性值之一的元素

python - BeautifulSoup4 返回错误的 HTML?

r - 尝试从 FiveThirtyEight 抓取数据时出现错误

html - rvest R 抓取 - html_table() 中缺少表