python - 使用 python 抓取图像并更改其名称

标签 python web-scraping

我有一个使用 Python 从 Tumblr 抓取图像的项目。 我想下载在我抓取的链接上找到的图像。

这是完整的代码:

import requests
from bs4 import BeautifulSoup
import shutil
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


for source, image_urls in data.items():
    for url in image_urls:
        if posts_scrape.status_code == 200:
            url.raw.decode_content = True
            with open(source,'wb') as f:
                shutil.copyfileobj(url.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

根据这篇文章的答案,我更改了代码并使用了 requestshutil:

for source, image_urls in data.items():
    for url in image_urls:
        if posts_scrape.status_code == 200:
            url.raw.decode_content = True
            with open(source,'wb') as f:
                shutil.copyfileobj(url.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

现在我收到此错误:

Traceback (most recent call last):
  File "/home/user/folder/Information.py", line 28, in <module>
    url.raw.decode_content = True
AttributeError: 'str' object has no attribute 'raw'

最佳答案

您必须使用图像 URL 再次发出请求。然后您可以获得原始形式的响应并保存图像

将代码替换为下面的代码 -

for source, image_urls in data.items():
    for url in image_urls:
        # make request with image url 
        img_scrape = requests.get(url, stream=True)

        if img_scrape.status_code == 200:
            with open(source,'wb') as f:
                img_scrape.raw.decode_content = True
                
                # save the image raw format
                shutil.copyfileobj(r.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

输出 -

Image sucessfully Downloaded:  pics-bae
Image sucessfully Downloaded:  pics-bae
Image sucessfully Downloaded:  laravel
Image sucessfully Downloaded:  huariqueje
Image sucessfully Downloaded:  sweetd3lights
Image sucessfully Downloaded:  shesinthegrove
Image sucessfully Downloaded:  careful-disorder
Image sucessfully Downloaded:  beifongkendo
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  bradsbackpack
Image sucessfully Downloaded:  pensamentsisomnis
Image sucessfully Downloaded:  frankfurtphoto
........

..........

关于python - 使用 python 抓取图像并更改其名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72367377/

相关文章:

python - 如何在python中更改对象的类型

python - 如何根据 Pandas 中的匹配列减去行?

Python 网页抓取 : I have a website with pick list. 以及如何提取这些列表中的文本

python - 知道如何使用 scrapy 访问此网址吗?

python - 使用 Python 中的请求进行网页抓取 - 脚本响应

python - scipys ndimage 过滤器的 "reflect"模式究竟是如何工作的?

python - setuptools 如何向运行时添加模块?

excel - VBA 网页抓取脚本返回下标超出范围

python - tree.xpath() 使用 lxml 库在 Webscraping 中返回空列表

c++ - 所有可用的 swig+python+mingw 编译信息都过时了吗?