azure - 在 Databricks 的 DBFS 上安装 ADLS 时出错(错误 : NullPointerException)

标签 azure azure-active-directory databricks azure-data-lake

我尝试在 Databricks 中安装 Azure Data Lake Gen 2,但出现下面的错误。

java.lang.NullPointerException: authEndpoint

我正在使用的代码如下所示

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.auth.provider.type": "org.apache.hadoop.fs.azurebfs.ClientCredsTokenProvider",
  "fs.azure.account.auth2.client.id": "<client-id>",
  "fs.azure.account.auth2.client.secret": dbutils.secrets.get(scope = "scope1", key = "kvsecretfordbricks"),
  "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}

dbutils.fs.mount(
    source = "abfss://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8bf8ffeaf9ffeef9bacbe5eefcf9eef8e4fef9e8eeecf9e4fefbeaefe8eaefe7f8a5efedf8a5e8e4f9eea5fce2e5efe4fcf8a5e5eeff" rel="noreferrer noopener nofollow">[email protected]</a>/",
    mount_point = "/mnt/demo",
  extra_configs = configs)

下面给出了完整的错误

--------------------------------------------------------------------------- ExecutionError Traceback (most recent call last) in 9 source = "abfss://[email protected]/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)

/local_disk0/tmp/1612619970782-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316

ExecutionError: An error occurred while calling o271.mount. : java.lang.NullPointerException: authEndpoint at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:84) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

如有任何帮助,我们将不胜感激

当我运行时

dbutils.fs.unmount("/mnt")

没有以“/mnt”开头的挂载点

--

更新

dfs.adls.oauth2.refresh.url更新为fs.azure.account.oauth2.client.endpoint后出现其他错误消息

ExecutionError Traceback (most recent call last) in 9 source = "abfss://[email protected]/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)

/local_disk0/tmp/1612858508533-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316

ExecutionError: An error occurred while calling o275.mount. : java.lang.NullPointerException: clientId at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:85) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

最佳答案

如果要将 Azure Data Lake Storage Gen2 帐户挂载到 DBFS,请将 dfs.adls.oauth2.refresh.url 更新为 fs.azure.account.oauth2.client。端点。更多详情请引用official documenthere

例如

  1. 创建 Azure Data Lake Storage Gen2 帐户。
az login
az storage account create \
    --name <account-name> \
    --resource-group <group name> \
    --location westus \
    --sku Standard_RAGRS \
    --kind StorageV2 \
    --enable-hierarchical-namespace true
  • 创建服务主体并将 Storage Blob Data Contributor 分配给 Data Lake Storage Gen2 存储帐户范围内的 sp
  • az login
    
    az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
        --scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
    
  • 在 Azure Databricks 中创建 Spark 群集

  • 在 Azure databricks 中挂载 Azure data Lake gen2(python)

  • configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": "<application-id>",
               "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
               "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
    
    # Optionally, you can add <directory-name> to the source URI of your mount point.
    dbutils.fs.mount(
      source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
      mount_point = "/mnt/demo",
      extra_configs = configs)
    

    enter image description here

  • 检查
  • dbutils.fs.ls("/mnt/demo")
    

    enter image description here

    关于azure - 在 Databricks 的 DBFS 上安装 ADLS 时出错(错误 : NullPointerException),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66078621/

    相关文章:

    azure - 在 vscode 中使用 databricks 扩展时我的 kedro 输出在哪里

    azure - Databricks 小数数据类型列值转换为 null

    azure - 从 VS Code 部署 Azure 函数 - 成功但在门户中不可见

    asp.net - session Cookie(与身份验证相关)不包含 "HTTPOnly"属性

    azure - Azure AD 应用程序注册中应用程序角色和范围之间的差异

    当我尝试登录时,Azure 门户显示错误 AADSTS50020

    Azure 媒体服务在编码作业时为 .wmv 文件提供 JobState.Error

    azure - 从多个租户获取虚拟机详细信息

    Azure Active Directory 单一登录超时

    java - 如何在Spark +2.4中读取CSV时设置时间戳格式