我们有一项审核要求,旨在深入了解谁在 Azure Databricks 中何时执行了哪些查询。 Azure Databricks/Spark UI/作业选项卡已列出已执行的 Spark 作业,包括完成的查询及其提交时间。但它不包括谁执行了查询。
- 是否有一个 API 可以与 Azure Databricks 一起使用来查询 UI 中显示的这些 Spark 作业详细信息? (Databricks REST API似乎没有提供这个,但也许我忽略了一些东西)
- 有什么方法可以确定谁创建了 Spark 作业(使用 API)
谢谢, 下罗
最佳答案
1。访问 Spark API
a.驱动程序节点(内部)访问 Azure Databricks Spark api:
import requests
driverIp = spark.conf.get('spark.driver.host')
port = spark.conf.get("spark.ui.port")
url = F"http://{driverIp}:{port}/api/v1/applications"
r = requests.get(url, timeout=3.0)
r.status_code, r.text
例如,如果您从公共(public) API 收到此错误消息:
PERMISSION_DENIED:不允许此端口上的流量
b.对 Azure Databricks Spark API 的外部访问:
import requests
import json
"""
Program access to Databricks Spark UI.
Works external to Databricks environment or running within.
Requires a Personal Access Token. Treat this like a password, do not store in a notebook. Please refer to the Secrets API.
This Python code requires F string support.
"""
# https://<databricks-host>/driver-proxy-api/o/0/<cluster_id>/<port>/api/v1/applications/<application-id-from-master-spark-ui>/stages/<stage-id>
port = spark.conf.get("spark.ui.port")
clusterId = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
host = "eastus2.azuredatabricks.net"
workspaceId = "999999999999111" # follows the 'o=' in the databricks URLs or zero
token = "dapideedeadbeefdeadbeefdeadbeef68ee3" # Personal Access token
url = F"https://{host}/driver-proxy-api/o/{workspaceId}/{clusterId}/{port}/api/v1/applications/?status=running"
r = requests.get(url, auth=("token", token))
# print Application list response
print(r.status_code, r.text)
applicationId = r.json()[0]['id'] # assumes only one response
url = F"https://{host}/driver-proxy-api/o/{workspaceId}/{clusterId}/{port}/api/v1/applications/{applicationId}/jobs"
r = requests.get(url, auth=("token", token))
print(r.status_code, r.json())
2。抱歉,不,现在不行。
您可以查看集群日志,但用户身份不存在。
投票并跟踪这个想法:https://ideas.databricks.com/ideas/DBE-I-313 如何访问创意门户:https://docs.databricks.com/ideas.html
关于Azure Databricks : create audit trail for who ran what query at what moment,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61910208/