我正在使用 python,当我单击 this page 底部的 DATA V CSV
按钮时,我试图获取 CSV 来源的链接。 .
我尝试了beautifulsoup
:
import requests
from bs4 import BeautifulSoup
url = 'https://www.ceps.cz/en/all-data#AktualniSystemovaOdchylkaCR'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# Find the link to the CSV file
csv_link = soup.find('a', string='DATA V CSV').get('href')
我也尝试过:
soup.find("button", {"id":"DATA V CSV"})
但它找不到 DATA V CSV
后面的链接。
最佳答案
为了获取所有数据,您需要完全模仿发送到服务器的请求。
具体操作方法如下:
from shutil import copyfileobj
from urllib.parse import urlencode
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
"referer": "https://www.ceps.cz/en/all-data",
"accept": "application/json, text/javascript, */*; q=0.01",
"cookie": "nette-samesite=1; ARRAffinity=3ee2404f26d0149d946e50cb3d4c22661f9f3b6510837fa538c67990a81979de; ARRAffinitySameSite=3ee2404f26d0149d946e50cb3d4c22661f9f3b6510837fa538c67990a81979de"
}
payload = {
"do": "loadGraphData",
"method": "AktualniSystemovaOdchylkaCR",
"graph_id": "1026",
"move_graph": "day",
"download": "csv",
"date_to": "2023-03-28T23:59:59",
"date_from": "2023-03-28T00:00:00",
"agregation": "MI",
"date_type": "day",
"interval": "false",
"version": "bla",
"function": "AVG",
}
all_data = "https://www.ceps.cz/en/all-data"
download_url = "https://www.ceps.cz/download-data/?format=csv"
with requests.Session() as s:
headers.update({"x-requested-with": "XMLHttpRequest"})
r = s.get(f"{all_data}?{urlencode(payload)}", headers=headers)
print(r.json()["result"])
headers.pop("x-requested-with")
with s.get(download_url, headers=headers, stream=True) as r, \
open("data.csv", "wb") as f:
copyfileobj(r.raw, f)
您应该得到 semicolon
- 分隔的文件如下所示:
关于python - 如何以编程方式获取 javascript 页面后面的 CSV 链接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75863105/