python - 无法使用 urllib2 从 Web 保存图像

标签 python python-2.7 beautifulsoup urllib2

我想使用 python urllib2 从网站保存一些图像，但是当我运行代码时，它会保存其他内容。

这是我的代码:

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
url = "http://m.jaaar.com/"
r = urllib2.Request(url, headers=headers)
page = urllib2.urlopen(r).read()

soup = BeautifulSoup(page)
imgTags = soup.findAll('img')
imgTags = imgTags[1:]


for imgTag in imgTags:
    imgUrl = "http://www.jaaar.com" + imgTag['src']
    imgUrl = imgUrl[0:-10] + imgUrl[-4:]
    fileName = "khabarnak-" + imgUrl[-12:]
    print fileName

    imgData = urllib2.urlopen(imgUrl).read()
    print imgUrl

    output = open("C:\wamp\www\py\pishkhan\\" + fileName,'wb')
    output.write(imgData)
    output.close()

有什么建议吗？

最佳答案

该站点正在向您返回标准图像，因为您正在抓取该站点。在检索图像时使用相同的设置标题的“技巧”:

imgRequest = urllib2.Request(imgUrl, headers=headers)
imgData = urllib2.urlopen(imgRequest).read()

关于python - 无法使用 urllib2 从 Web 保存图像，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14439809/

上一篇：python - 如何解决 strptime() 引起的 naive datetime RuntimeWarning？

下一篇：python - 连接拒绝 PythonAnywhere 上的 Twitter API

python - 在 mac 上安装 virtualenvwrapper- OSError : [Errno 1] Operation not permitted:

python - 编写指定行号和列号的 Excel 文件 - openpyxl

python - 父标签的子标签有特定属性值时，如何使用BeautifulSoup获取父标签名称值？

python - 努力尝试创造新系列

python - 如何解决这个错误: module 'gensim' has no attribute 'models'

python - 用一些特殊字符替换一串字符？

python-2.7 - Selenium Python XPATH具有行和列的表如何从col1中选择名称为col2和col5的复选框

python - 将数据框写入带有宽列的Excel

python - BeautifulSoup:在 html 中查找特定 URL 并打印