python - 如何以 'smarter' 的方式使用 python 下载文件？

我需要在 Python 中通过 http 下载几个文件。

最明显的方法就是使用 urllib2:

import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()

但我将不得不以某种方式处理令人讨厌的 URL，例如:http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf。当通过浏览器下载时，该文件有一个人类可读的名称，即。 accounts.pdf。

有没有办法在 python 中处理它，所以我不需要知道文件名并将它们硬编码到我的脚本中？

最佳答案

下载这样的脚本往往会推送一个标题，告诉用户代理如何命名文件:

Content-Disposition: attachment; filename="the filename.ext"

如果你能捕获那个标题，你就能得到正确的文件名。

有another thread为 Content-Disposition-grabbing 提供了一些代码。

remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']

关于python - 如何以 'smarter' 的方式使用 python 下载文件？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/862173/

相关文章：

Python argparse - 如果没有给出参数，则默认互斥组