Azure Databricks 通过服务主体访问 Azure Data Lake Storage Gen2

标签 azure azure-databricks

我想通过服务主体从 Azure Databricks 集群访问 Azure Data Lake Storage Gen2,以摆脱存储帐户访问 key
我关注https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#--mount-an-azure-data-lake-storage-gen2-account-using-a-service-principal-and-oauth-20
..但是据说仍然使用存储帐户访问 key : enter image description here

如果仍然需要存储帐户访问 key ,那么操作系统服务帐户的用途是什么?
主要问题是 - 是否可以完全摆脱存储帐户访问 key 并仅使用服务主体?

最佳答案

这是一个文档错误,目前我正在立即修复。

应该是dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>") retrieves your service-credential that has been stored as a secret in a secret scope.

Python:通过传递直接值挂载 Azure Data Lake Storage Gen2 文件系统

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "0xxxxxxxxxxxxxxxxxxxxxxxxxxf", #Enter <appId> = Application ID
       "fs.azure.account.oauth2.client.secret": "Arxxxxxxxxxxxxxxxxxxxxy7].vX7bMt]*", #Enter <password> = Client Secret created in AAD
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/72fxxxxxxxxxxxxxxxxxxxxxxxxb47/oauth2/token", #Enter <tenant> = Tenant ID
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="81e7e8ede4f2f8f2f5e4ecc1e2e9e4f1f3e0e6e4efb3afe5e7f2afe2eef3e4aff6e8efe5eef6f2afefe4f5" rel="noreferrer noopener nofollow">[email protected]</a>/flightdata", #Enter <container-name> = filesystem name <storage-account-name> = storage name
mount_point = "/mnt/flightdata",
extra_configs = configs)

enter image description here

Python:通过使用 dbutils secret 在 secret 范围内作为 secret 传递来挂载 Azure Data Lake Storage Gen2 文件系统

configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": "06xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0ef",
           "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope = "chepra", key = "service-credential"),
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/72xxxxxxxxxxxxxxxxxxxx011db47/oauth2/token"}

dbutils.fs.mount(
source = "abfss://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="64020d0801171d1710010924070c0114160503010a564a0002174a070b16014a130d0a000b13174a0a0110" rel="noreferrer noopener nofollow">[email protected]</a>/flightdata", 
mount_point = "/mnt/flightdata",
extra_configs = configs)

enter image description here

希望这有帮助。如果您还有任何疑问,请告诉我们。

关于Azure Databricks 通过服务主体访问 Azure Data Lake Storage Gen2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61105766/

相关文章:

azure - Databricks - 写入 Azure Synapse 时出错

azure - 使用/mnt/将数据从 Azure Blob 存储读取到 Azure Databricks

python - 无法从 python 文件运行 azure databricks

azure - 覆盖 Blob 但保留元数据?

Azure Synapse 创建管道 Rest API

database - 如何在 Redis 矢量数据库中存储 OpenAI 嵌入?

Powershell 切换到生产环境

python - Azure databricks 和 Python 虚拟环境

azure - 如何动态更新 Azure APIM 中的后端/出站 URL

asp.net - HttpRuntime.Cache 在 Windows Azure 上的高负载下损坏数据