python - 使用 python 抓取图像并更改其名称

我有一个使用 Python 从 Tumblr 抓取图像的项目。我想下载在我抓取的链接上找到的图像。

这是完整的代码:

import requests
from bs4 import BeautifulSoup
import shutil
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


for source, image_urls in data.items():
    for url in image_urls:
        if posts_scrape.status_code == 200:
            url.raw.decode_content = True
            with open(source,'wb') as f:
                shutil.copyfileobj(url.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

根据这篇文章的答案，我更改了代码并使用了 request 和 shutil:

for source, image_urls in data.items():
    for url in image_urls:
        if posts_scrape.status_code == 200:
            url.raw.decode_content = True
            with open(source,'wb') as f:
                shutil.copyfileobj(url.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

现在我收到此错误:

Traceback (most recent call last):
  File "/home/user/folder/Information.py", line 28, in <module>
    url.raw.decode_content = True
AttributeError: 'str' object has no attribute 'raw'

最佳答案

您必须使用图像 URL 再次发出请求。然后您可以获得原始形式的响应并保存图像

将代码替换为下面的代码 -

for source, image_urls in data.items():
    for url in image_urls:
        # make request with image url 
        img_scrape = requests.get(url, stream=True)

        if img_scrape.status_code == 200:
            with open(source,'wb') as f:
                img_scrape.raw.decode_content = True
                
                # save the image raw format
                shutil.copyfileobj(r.raw, f)
            print('Image sucessfully Downloaded: ',source)
        else:
            print('Image Couldn\'t be retrieved')

输出 -

Image sucessfully Downloaded:  pics-bae
Image sucessfully Downloaded:  pics-bae
Image sucessfully Downloaded:  laravel
Image sucessfully Downloaded:  huariqueje
Image sucessfully Downloaded:  sweetd3lights
Image sucessfully Downloaded:  shesinthegrove
Image sucessfully Downloaded:  careful-disorder
Image sucessfully Downloaded:  beifongkendo
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  traveltoslovenia
Image sucessfully Downloaded:  bradsbackpack
Image sucessfully Downloaded:  pensamentsisomnis
Image sucessfully Downloaded:  frankfurtphoto
........

..........

关于python - 使用 python 抓取图像并更改其名称，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72367377/

python - 使用 python 抓取图像并更改其名称

上一篇：r - 使用 Rplotly 创建条形图和饼图子图时出现问题

下一篇：docker - 通过 ssh 证书为单用户设置 docker repo 身份验证