azure - 在运行时从 Azure Data Bricks Pipeline 传递 Databricks ClusterID

我希望使 Azure 链接服务可配置，从而在运行时传递 Databricks WorkspaceURL 和 ClusterID。我将拥有多个 Spark 集群，并根据集群的大小，我将调用集群的类型/大小。

我没有找到从 ADF 管道获取 DataBricks ClusterID 和 passit 的选项

最佳答案

您可以使用 REST API Clusters API 2.0 获取集群列表。

https://adb-7012303279496007.7.azuredatabricks.net/api/2.0/clusters/list

我复制了上面的内容并得到了下面的结果。

首先在databricks工作区中生成访问 token ，并在网络事件中使用该 token 作为获取集群列表的授权。

enter image description here

网络事件的输出:

enter image description here

上面还包含簇大小(以 mb 为单位)。将以上内容存储在数组变量中。

enter image description here

要根据集群大小获取所需的集群 ID，您可以根据您的要求使用过滤条件。

在这里，对于示例，我使用以 mb 为单位的簇大小作为过滤条件。

enter image description here

笔记本链接服务:

cluster_id 参数。

enter image description here

从过滤数组中传递所需的 cluster_id，如下所示。

@activity('Filter1').output.Value[0].cluster_id

enter image description here

您可以使用动态内容指定笔记本路径。

enter image description here

我的执行:

enter image description here

关于azure - 在运行时从 Azure Data Bricks Pipeline 传递 Databricks ClusterID，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74207811/