我正在尝试使用 python requests 模块下载文件,但首先登录到该站点。我可以登录,但是当我发送下载文件的获取请求时,它会再次显示登录页面。
代码:
login_url = 'https://seller.flipkart.com/login'
manifest_url = 'https://seller.flipkart.com/order_management/manifest.pdf'
username = '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0d787e687f636c60684d6a606c6461236e6260" rel="noreferrer noopener nofollow">[email protected]</a>'
password = 'password'
params = {'sellerId':'seller_id'}
payload = {'authName':'flipkart',
'username':username,
'password':password}
ses = requests.Session()
ses.post(login_url, data=payload, headers={'Content-Type':'application/x-www-form-urlencoded','Connection':'keep-alive'})
response = ses.get(manifest_url, params=params, headers={'Content-Type':'application/pdf','Connection':'keep-alive'})
print response.status_code
print response.url
print response.content
运行此代码时,我将登录页面的 html 作为内容。 我使用 fiddler 并得到以下数据:
Request URL: https://seller.flipkart.com/order_management/manifest.pdf?sellerId=seller_id
Request Method: GET
sellerId: seller_id
# Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36
Referer: https://seller.flipkart.com/order_management?sellerId=seller_id
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
# Response Headers
Server: nginx
Date: Wed, 30 Dec 2015 13:12:31 GMT
Content-Type: application/pdf
Content-Length: 3652
Connection: keep-alive
X-XSS-Protection: 1; mode=block
strict-transport-security: max-age=31536000; preload
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Cache-Control: private, no-cache, no-store, must-revalidate
Expires: -1
Pragma: no-cache
X-Req-Id: REQ-14d7434a-e429-40e4-801f-6010d7c0b48c
X-Host-Id: 0008
content-disposition: attachment; filename=Manifest-seller_id-30-Dec-2015-18-42-30.pdf
vary: Accept-Encoding
如何下载文件?
最佳答案
设置stream=True
,然后将内容写入文件。
import re
# Send request by setting 'stream=True'
r = ses.get(manifest_url, ..., stream=True)
# Fetch filename
d = r.headers['content-disposition']
fname = re.findall("filename=(.+)", d)
# Write content to file
with open(fname, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
Docs .
关于python - 使用python请求登录后无法下载文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34542123/