python-3.x - 确定 url 是 pdf 还是 html 文件

标签 python-3.x python-requests

我正在使用 python 中的请求包请求 url(例如 file = requests.get(url))。 urls 中没有指定扩展名，有时返回一个 html 文件，有时返回一个 pdf。
有没有办法确定返回的文件是 pdf 还是 html，或者更一般地说，文件格式是什么？浏览器能够确定，所以我认为它必须在响应中指明。

最佳答案

这将在 Content-Type 中找到标题，或者 text/html或 application/pdf

 import requests

 r = requests.get('http://example.com/file')
 content_type = r.headers.get('content-type')

 if 'application/pdf' in content_type:
     ext = '.pdf'
 elif 'text/html' in content_type:
     ext = '.html'
 else:
     ext = ''
     print('Unknown type: {}'.format(content_type))

 with open('myfile'+ext, 'wb') as f:
     f.write(r.raw.read())

关于python-3.x - 确定 url 是 pdf 还是 html 文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38690586/

上一篇：erlang - 我如何知道 elixir 的 CPU 中的内核数量？

下一篇：groovy - 在 Groovy 中将整数转换为十六进制字符串

python - 使用 Python 请求传递登录名/密码

python - 从 URL 下载文件并将其保存在 Python 文件夹中

python - Intellij idea 无法识别 python 3 中本地类的导入

python - 在 Python 中使用 Requests 库发送 "User-agent"

python - 回调 url 格式无效 - Instagram API 和 python 请求库

python-3.x - 使用带有 session 的 Python 请求登录网站

python - 从上到下阅读源代码时，装饰器是否总是按照遇到的顺序调用？

python - 如何从数据框列值创建单独的子字符串列

python - 如何识别某个图像何时消失