python - 如何使用 Python 请求填写日期选项以及从单选按钮下载

我正在尝试制作一个 python 脚本来从该网站抓取数据

https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315

并下载昨天的 CSV。正如您所看到的，它有两个日期菜单选项、一个 CSV 单选按钮和一个提交按钮。

我想也许我可以使用请求库？不是找人帮我做这件事，但如果有人能给我指出正确的方向，那就太好了!

我知道这太简单了，但这是我到目前为止所得到的:

import requests

print('Download Starting...')

url = 'https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315'

r = requests.get(url)

filename = url.split('/')[-1] # this will take only -1 splitted part of the url

with open(filename,'wb') as output_file:
    output_file.write(r.content)

print('done')

最佳答案

您需要首先使用requests.Session()来存储cookie并在后续请求中重新发送它们。流程如下:

先获取原始URL以获取cookie( session id)
在 POST/reports/ci_report/server/request.php 上发出请求，其中包含一些参数，包括日期和输出格式。结果是一个带有如下 id 的 json:
```
{'jrId': 'jr_13879611'}
```
对 GET/reports/ci_report/server/streamReport.php?jrId=jr_13879611 发出请求，提供 csv 数据

POST 请求中有一个参数，我们需要原始 url 中的 menuitem 查询参数值，因此我们使用 urlparse 解析查询参数以获取它:

import requests
import time
import urllib.parse as urlparse
from urllib.parse import parse_qs
from datetime import datetime,timedelta

yesterday = datetime.now() - timedelta(1)
yesterday_date = f'{yesterday.strftime("%d")}-{yesterday.strftime("%B")[:3]}-{yesterday.strftime("%Y")}'

original_url = "https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315"
parsed = urlparse.urlparse(original_url)

target_url = "https://noms.wei-pipeline.com/reports/ci_report/server/request.php"
stream_report_url = "https://noms.wei-pipeline.com/reports/ci_report/server/streamReport.php"

s = requests.Session()
# load the cookies
s.get(original_url)

#get id
r = s.post(target_url,
    params = {
        "request.preventCache": int(round(time.time() * 1000))
    },
    data = {
        "ReportProc": "CIPR_DAILY_BULLETIN",
        "p_ci_id": parse_qs(parsed.query)['menuitem'][0],
        "p_opun": "PL",
        "p_gas_day_from": yesterday_date,
        "p_gas_day_to": yesterday_date,
        "p_output_option": "CSV"
})
r = s.get(stream_report_url, params = r.json())
print(r.text)

Try this on repl.it

关于python - 如何使用 Python 请求填写日期选项以及从单选按钮下载，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64161080/

python - 如何使用 Python 请求填写日期选项以及从单选按钮下载

上一篇：python - 如何在 AWS Lambda 中使用 Python 自定义包

下一篇：python - 如何仅针对 dynamodb 流中的 INSERT 事件触发 lambda 函数？