python-3.x - Curl 给出响应但 python 没有响应并且请求调用没有终止?

标签 python-3.x curl web-scraping urllib3

我正在尝试以下 curl 请求

curl 'https://www.nseindia.com/api/historical/cm/equity?symbol=COALINDIA&series=\[%22EQ%22\]&from=03-05-2020&to=03-05-2021&csv=true' \
-H 'authority: www.nseindia.com' \
-H 'accept: */*' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/88.0.4324.182 Safari/537.36' \
-H 'x-requested-with: XMLHttpRequest' \
-H 'sec-gpc: 1' \
-H 'sec-fetch-site: same-origin' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-dest: empty' \
-H 'referer: https://www.nseindia.com/get-quotes/equity?symbol=COALINDIA' \
-H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
-H 'cookie: ak_bmsc=2D5CCD6F330B77016DD02ADFD8BADB8A58DDD69E733C0000451A9060B2DF0E5C~pllIy1yQvFABwPqSfaqwV4quP8uVOfZBlZe9dhyP7+7vCW/YfXy32hQoUm4wxCSxUjj8K67PiZM+8wE7cp0WV5i3oFyw7HRmcg22nLtNY4Wb4xn0qLv0kcirhiGKsq4IO94j8oYTZIzN227I73UKWQBrCSiGOka/toHASjz/R10sX3nxqvmMSBlWvuuHkgKOzrkdvHP1YoLPMw3Cn6OyE/Z2G3oc+mg+DXe8eX1j8b9Hc=; nseQuoteSymbols=[{"symbol":"COALINDIA","identifier":null,"type":"equity"}]; nsit=X5ZCfROTTuLVwZzLBn7OOtf0; AKA_A2=A; bm_mi=6CE0B82205ACE5A1F72250ACDDFF563E~LZ4/HQ257rSMBPCrxy0uSDvrSxj4hHpLQqc8R5JZOzUZYo1OqZg5Q/GOt88XNtMbsWM8bB22vtCXzvksGwPcC/bH2nPFEZr0ci6spQ4GOpCa/TM7soc02HVf0tyDTkmg/ZdLZlWzond4r0vn+QpSB7f3fiVza1Gdx9OaFL1i3rvqe1OKmFONreHEue20PL0hlREVWeLcFM/5DxKArPwzCSopPp62Eea1510iivl7GmY=; nseappid=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhcGkubnNlIiwiYXVkIjoiYXBpLm5zZSIsImlhdCI6MTYyMDA2MTQ5OSwiZXhwIjoxNjIwMDY1MDk5fQ.YBTQ0MqRayD3QBM3V6zUt5zbRRICkbIhWWNedkDYrdU; bm_sv=C49B743B48F174C77F3DDAD188AA6D87~bm5TD36snlaRLx9M5CS+FOUicUcbVV3OIKjZU2WLwd1PtHYUum7hnBfYeUCDv+5Xdb9ADklnmm1cwZGJJbiBstcA6c5vju53C7aTFBorl8SJZjBN/4ku61oz0ncrQYCaSxkFGkRRY9VMWm6SpQwHXfMsUzc/Qk7301zs7KZuGCY=' \
--compressed 
这给了我们所需的响应(下面的示例)
"Date ","series ","OPEN ","HIGH ","LOW ","PREV. CLOSE ","ltp ","close ","vwap ","52W H","52W L ","VOLUME ","VALUE ","No of trades "
"03-May-2021","EQ","133.00","133.45","131.20","133.05","132.20","132.20","132.21","163.00","109.55",10262391,"1,356,811,541.80",59409
但是如果我使用下面的python脚本来获取数据
import requests

headers = {
'authority': 'www.nseindia.com',
'accept': '*/*',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'sec-gpc': '1',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.nseindia.com/get-quotes/equity?symbol=COALINDIA',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8','cookie':'ak_bmsc=2D5CCD6F330B77016DD02ADFD8BADB8A58DDD69E733C0000451A9060B2DF0E5C~pllIy1yQvFABwPqSfaqwV4quP8uVOfZBlZe9dhyP7+7vCW/YfXy32hQoUm4wxCSxUjj8K67PiZM+8wE7cp0WV5i3oFyw7HRmcg22nLtNY4Wb4xn0qLv0kcirhiGKsq4IO94j8oYTZIzN227I73UKWQBrCSiGOka/toHASjz/R10sX3nxqvmMSBlWvuuHkgKOzrkdvHP1YoLPMw3Cn6OyE/Z2G3oc+mg+DXe8eX1j8b9Hc=; nseQuoteSymbols=[{"symbol":"COALINDIA","identifier":null,"type":"equity"}]; nsit=X5ZCfROTTuLVwZzLBn7OOtf0; AKA_A2=A; bm_mi=6CE0B82205ACE5A1F72250ACDDFF563E~LZ4/HQ257rSMBPCrxy0uSDvrSxj4hHpLQqc8R5JZOzUZYo1OqZg5Q/GOt88XNtMbsWM8bB22vtCXzvksGwPcC/bH2nPFEZr0ci6spQ4GOpCa/TM7soc02HVf0tyDTkmg/ZdLZlWzond4r0vn+QpSB7f3fiVza1Gdx9OaFL1i3rvqe1OKmFONreHEue20PL0hlREVWeLcFM/5DxKArPwzCSopPp62Eea1510iivl7GmY=; nseappid=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhcGkubnNlIiwiYXVkIjoiYXBpLm5zZSIsImlhdCI6MTYyMDA2MTQ5OSwiZXhwIjoxNjIwMDY1MDk5fQ.YBTQ0MqRayD3QBM3V6zUt5zbRRICkbIhWWNedkDYrdU; bm_sv=C49B743B48F174C77F3DDAD188AA6D87~bm5TD36snlaRLx9M5CS+FOUicUcbVV3OIKjZU2WLwd1PtHYUum7hnBfYeUCDv+5Xdb9ADklnmm1cwZGJJbiBstcA6c5vju53C7aTFBorl8SJZjBN/4ku61oz0ncrQYCaSxkFGkRRY9VMWm6SpQwHXfMsUzc/Qk7301zs7KZuGCY=',}

params = (
('symbol', 'COALINDIA'),
('series', '/["EQ"/]'),
('from', '30-04-2021'),
('to', '03-05-2021'),
('csv', 'true'),
)

response = requests.get('https://www.nseindia.com/api/historical/cm/equity', headers=headers, params=params)
它卡在最后一行。
我正在使用 python3.9 和 urllib3。
不知道是什么问题。
这个 url 从网站下载一个 csv 文件。

最佳答案

你必须用 Python 跳过一些循环才能得到你想要的文件。主要是需要获取请求头cookie部分正确,否则您将不断收到 401代码。
首先,您需要从权威www.nseindia.com 获取常规cookies .然后,您需要获取 bm_sv来自 https://www.nseindia.com/json/quotes/equity-historical.json 的 cookie .最后,添加名为 nseQuoteSymbols 的东西.
将所有这些粘合在一起并发出获取文件的请求。
就是这样:

from urllib.parse import urlencode

import requests

headers = {
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/88.0.4324.182 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
    'referer': 'https://www.nseindia.com/get-quotes/equity?symbol=COALINDIA',
}

payload = {
    "symbol": "COALINDIA",
    "series": '["EQ"]',
    "from": "04-04-2021",
    "to": "04-05-2021",
    "csv": "true",
}

api_endpoint = "https://www.nseindia.com/api/historical/cm/equity?"

nseQuoteSymbols = 'nseQuoteSymbols=[{"symbol":"COALINDIA","identifier":null,"type":"equity"}]; '


def make_cookies(cookie_dict: dict) -> str:
    return "; ".join(f"{k}={v}" for k, v in cookie_dict.items())


with requests.Session() as connection:
    authority = connection.get("https://www.nseindia.com", headers=headers)
    historical_json = connection.get("https://www.nseindia.com/json/quotes/equity-historical.json", headers=headers)
    bm_sv_string = make_cookies(historical_json.cookies.get_dict())

    cookies = make_cookies(authority.cookies.get_dict()) + nseQuoteSymbols + bm_sv_string
    connection.headers.update({**headers, **{"cookie": cookies}})

    the_real_slim_shady = connection.get(f"{api_endpoint}{urlencode(payload)}")
    csv_file = the_real_slim_shady.headers["Content-disposition"].split("=")[-1]
    with open(csv_file, "wb") as f:
        f.write(the_real_slim_shady.content)
输出 -> 一个 .csv看起来像这样的文件:
enter image description here

关于python-3.x - Curl 给出响应但 python 没有响应并且请求调用没有终止?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67373373/

相关文章:

php - 为什么 php curl 不在我的 cookiefile 中保存 cookie?

ruby - 用于抓取的 Node.js 或 Ruby

python - 如何在python中重复找到T个数字中每个数字的数字和,直到它成为一位数字?

python - "suppress_callback_exceptions"在破折号 Python 中的作用是什么?

php - 如何避免 curl_exec() 中的 echo ?

Magento Paypal 错误 - payment_paypal_direct.log 中的持久沙箱 URL

javascript - Greasemonkey 抓取程序代码抓取字母而不是单词?

python - python中的多个异常(exception)

python-3.x - 无法解析 pandas 系列到日期时间

python-3.x - 如何将可调用对象作为参数传递给 `functools.partial`