python - 如何调用databricks Rest API来列出运行的作业

标签 python azure azure-databricks databricks-workflows databricks-rest-api

我目前正在开发一个 Python 脚本来检索昨天执行的所有作业的完整列表。但是,我遇到了使用标记的脚本分页机制的问题。尽管我尝试循环分页过程,但结果输出保持不变。

这是代码

import requests
import pandas as pd
import math
import datetime
import json

def fetch_and_process_job_runs(base_uri, api_token, params):
    endpoint = '/api/2.1/jobs/runs/list'
    headers = {'Authorization': f'Bearer {api_token}'}
    
    all_data = []  # To store all the data from multiple pages
    
    while True:
#         print(params)
        response = requests.get(base_uri + endpoint, headers=headers, params=params)
        response_json = response.json()
        
        data = []
        for run in response_json["runs"]:
            start_time_ms = run["start_time"]
            start_time_seconds = start_time_ms / 1000
            start_time_readable = datetime.datetime.fromtimestamp(start_time_seconds).strftime('%Y-%m-%d %H:%M:%S')
            data.append({
                "job_id": run["job_id"],
                "creator_user_name": run["creator_user_name"],
                "run_name": run["run_name"],
                "run_page_url": run["run_page_url"],
                "run_id": run["run_id"],
                "execution_duration_in_mins": math.ceil(int(run.get('execution_duration')) / (1000 * 60)),
                "result_state": run["state"].get("result_state"),
                "start_time": start_time_readable
            })
        
        all_data.extend(data)
        df = pd.DataFrame(all_data)
        print(df)
        
        if response_json.get("has_more") == True:
            next_page_token = response_json.get("next_page_token")
            params['next_page_token'] = next_page_token
        else:
            break
    
    df = pd.DataFrame(all_data)
    return df

# Replace with your actual values
now = datetime.datetime.utcnow()
yesterday = now - datetime.timedelta(days=1)
start_time_from = int(yesterday.replace(hour=0, minute=0, second=0, microsecond=0).timestamp()) * 1000
start_time_to = int(yesterday.replace(hour=23, minute=59, second=59, microsecond=999999).timestamp()) * 1000
        
params = {
#      "start_time_from": start_time_from,
#      "start_time_to": start_time_to,
     "expand_tasks": True
}
baseURI = 'https://adb-xxxxxxxxxxxxxx.azuredatabricks.net'
apiToken = 'xxxxxxxxxxxxxxxxxxxxxxxxxx'

result_df = fetch_and_process_job_runs(baseURI, apiToken, params)
print(result_df)

请帮助我。

最佳答案

我注意到 API 响应中 next_token 的值没有变化,然后发现您的代码中有一个非常小的错误。请求中传递的参数是 page_token 而不是 next_page_token

根据 https://docs.databricks.com/api/workspace/jobs/list 的文档,

page_token 字符串

使用上一个请求返回的 next_page_tokenprev_page_token 分别列出下一页或上一页作业。

所以params['next_page_token'] 需要更改为 params['page_token']

关于python - 如何调用databricks Rest API来列出运行的作业,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76892500/

相关文章:

python - 从列列表中的括号内提取数字

.net - 检索通过 SendGrid 发送的电子邮件的状态

azure - 无法使用 terraform 更改 azure 子网

scala - Spark 流 |将不同的数据帧写入Synapse DW中的多个表

python - 当通过组合关联对象时,使用什么模式来促进对象之间的通信

python - 使用 Python 解析 shell 文件输出

python - python中的多变量回归属性选择

c# - 获取configurationRefresher为null

azure - Azure Databricks Unity 目录中的数据分类

azure - Databricks REST API 返回 HTTP 400 错误(带有 AAD 访问 token )