azure - 无法在 Databricks 中为 ADLS Gen2 创建挂载点

标签 azure databricks azure-databricks

我们正在尝试通过服务主体创建从 Azure Databricks 到 ADLS Gen2 的挂载点。服务主体具有适当的资源级别和数据级别访问权限。尽管我们已经确认可以通过访问 key 访问 ADLS Gen2,但尚未创建挂载点。已使用 Azure Databricks VNet 注入(inject)。

安装点失败并出现不可描述的错误。有一个防火墙正在审查来自 Databricks 的所有流量,因此我们假设挂载点(OAuth 服务或 Azure AD API)所需的某些内容被阻止。我们已确认 Databricks 可以连接到文件系统,但使用服务主体创建挂载点失败。未知 Azure Databricks 必须能够联系哪些 HTTPS 或其他服务才能创建装载点。我们相信,解锁这些服务端点将有助于创建。目前,仅允许 login.microsoftonline.com。

# Mount point for ADLS Gen2 via. Service principal
configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id":  "XXXXXX", 
           "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope = "XXXX-scope", key = "XXXX-key"),
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/XXXXX/oauth2/token"}

dbutils.fs.mount(
  source = "abfss://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a2c4cbcec7d1dbd1d6c7cfe2d1d6cdd0c3c5c7c3c1c1cdd7ccd68cc6c4d18cc1cdd0c78cd5cbccc6cdd5d18cccc7d6" rel="noreferrer noopener nofollow">[email protected]</a>/",
  mount_point = "/mnt/XXXX",
  extra_configs = configs)


Expect the mount point to be successfully created. Error below:

ExecutionError: An error occurred while calling o220.mount.
: java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:666)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
    at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:259)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenSingleCall(AzureADAuthenticator.java:256)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenCall(AzureADAuthenticator.java:211)
    at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:94)
    at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477)
    at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488)
    at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)

最佳答案

确保您提供了有效的服务主体详细信息,例如:(appId、密码、租户)。

Azure Data Lake Storage Gen2 安装配置:

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "<appId>",
       "fs.azure.account.oauth2.client.secret": "<password>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)

enter image description here

像访问本地文件一样访问文件系统中的文件:

enter image description here

引用: Tutorial: Access Data Lake Storage Gen2 data with Azure Databricks using Spark

希望这有帮助。

关于azure - 无法在 Databricks 中为 ADLS Gen2 创建挂载点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58222800/

相关文章:

azure - Get-AzSqlServerThreatDetectionPolicy 的 AZ CLI 等效项是什么?

azure - 如何使用 Azure B2C 处理多个 API

python - 如何验证您在 Azure 上是否有足够的资源

c# - 如何在azure表存储中使用partitionkey加速查询

linux - 具有替代方法的重载方法值 udf

azure - 如何使用服务主体从第三方应用程序使用 SQLAlchemy 连接到 Azure Databricks 的 Hive?

python - Databricks dbutils 不显示特定文件夹下的文件夹列表

apache-spark - 在 Databricks 上将 Spark.databricks.service.server.enabled 设置为 true 时到底会发生什么?

azure - 如何删除 Azure Databricks 资源组?

azure-databricks - 为什么 Azure Databricks 需要将数据存储在 Azure 中的临时存储中