我有存储帐户kagsa1
,其中包含容器cont1
,并且需要通过Databricks访问(安装)它
如果我在 KeyVault 中使用存储帐户 key ,它可以正常工作:
configs = {
"fs.azure.account.key.kagsa1.blob.core.windows.net":dbutils.secrets.get(scope = "kv-db1", key = "storage-account-access-key")
}
dbutils.fs.mount(
source = "wasbs://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="71121e1f0540311a10160210405f131d1e135f121e03145f06181f151e06025f1f1405" rel="noreferrer noopener nofollow">[email protected]</a>",
mount_point = "/mnt/cont1",
extra_configs = configs)
dbutils.fs.ls("/mnt/cont1")
..但如果我尝试使用 Azure Active Directory 凭据进行连接:
configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}
dbutils.fs.ls("abfss://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="22414d4c5613624943455143130c4644510c414d50470c554b4c464d55510c4c4756" rel="noreferrer noopener nofollow">[email protected]</a>/")
..失败:
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: GET https://kagsa1.dfs.core.windows.net/cont1?resource=filesystem&maxResults=5000&timeout=90&recursive=false
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=AuthorizationPermissionMismatch
ErrorMessage=This request is not authorized to perform this operation using this permission.
Databrics 工作区级别为高级,
群集启用了 Azure Data Lake 存储凭据传递选项,
存储帐户已启用分层命名空间选项,
文件系统已初始化为
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4d2e2223397c0d262c2a3e2c7c63292b3e632e223f28633a242329223a3e63232839" rel="noreferrer noopener nofollow">[email protected]</a>/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
我做错了什么?
最佳答案
注意:执行将应用程序分配给角色中的步骤时,请确保将存储 Blob 数据贡献者角色分配给服务主体。
作为重现的一部分,我已向服务主体提供所有者权限,并尝试运行“dbutils.fs.ls("mnt/azure/") ”,返回与上面相同的错误消息。
现在将存储 Blob 数据贡献者角色分配给服务主体。
最后,在将存储 Blob 数据贡献者角色分配给服务主体后,能够获得没有任何错误消息的输出。
更多详情请参阅“Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark ”。
关于Azure Databricks : can't connect to Azure Data Lake Storage Gen2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61100946/