python - 无法在 Python 中下载完整文件

标签 python python-3.x web-scraping beautifulsoup wget

我使用 Python 中的 Bs4 从 nmgncp.com 下载壁纸。然而，代码仅下载 16KB 的文件，而完整图像约为 300KB。请帮我。我什至尝试过 wget.download 方法。

PS:- 我在 Windows 10 上使用 Python 3.6。

这是我的代码::--

from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os


url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'

html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com/'+a
print(newurl)

response = requests.get(newurl)
if response.status_code == 200:
    with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
        f.write(response.content)

最佳答案

问题的根源是因为有一个保护:图像页面需要一个引用，否则它会重定向到html页面。

源代码已修复:

from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os


url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'

html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com'+a
print(newurl)

response = requests.get(newurl, headers={'referer': newurl})
if response.status_code == 200:
    with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
        f.write(response.content)

关于python - 无法在 Python 中下载完整文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46547676/

上一篇：python - PyQt - 基于值更新的闪烁背景颜色

下一篇：python - 在 SQLAlchemy 中使用 MariaDB 的 COLUMN_GET()

python - 保持 python 守护进程存活

python-3.x - 如何将dict_values转换成一个集合

python - 使用pythonlinkedin-scraper 2.6.0查找链接配置文件时出错

python - 忽略 td beautifulsoup 中的 N/A 值

python - Pandas - 将两列转换为一个新列作为字典

python - (python) st_mode 的含义

python - 如何在 Python 中创建数组

python - IndexError : list index out of range. 谁能帮我解决这个Python代码吗？包括 numpy 和 pandas 的概念

php - 使用xpath获取href