scala - 识别文件夹是否存在于 ADLS gen 2 帐户中的正确方法是什么

我在 scala 和 spark 环境中工作，我想在其中读取 Parquet 文件。在阅读之前，我想检查文件是否存在。我正在 jupyter notebook 中编写以下代码，但它不起作用 - 这意味着它不显示任何框架，因为函数 testDirExist 返回 false

import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val hadoopfs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration)

def testDirExist(path: String): Boolean = {
  val p = new Path(path)
  hadoopfs.exists(p) && hadoopfs.getFileStatus(p).isDirectory
}
val pt = "abfss://container@account.dfs.core.windows.net/blah/blah/blah

val exists = testDirExist(pt)
if(exists)
{
val dataframe = spark.read.parquet(pt)
    dataframe.show()
}

但是，以下代码有效。它显示数据框

val k = spark.read.parquet("abfss://container@account.dfs.core.windows.net/blah/blah/blah)
k.show()

谁能帮我检查文件是否存在？

谢谢

最佳答案

您只需将默认文件系统设置为您的存储帐户:

    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.FileSystem
    import org.apache.hadoop.fs.Path
    import java.io.PrintWriter

    val conf = new Configuration()
    conf.set("fs.defaultFS", "abfss://<container_name>@<account_name>.dfs.core.windows.net")
    conf.set("fs.azure.account.auth.type.<container_name>.dfs.core.windows.net", "OAuth")
    conf.set("fs.azure.account.oauth.provider.type.<container_name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    conf.set("fs.azure.account.oauth2.client.id.<container_name>.dfs.core.windows.net", "<client_id>")
    conf.set("fs.azure.account.oauth2.client.secret.<container_name>.dfs.core.windows.net", "<secret>")
    conf.set("fs.azure.account.oauth2.client.endpoint.<container_name>.dfs.core.windows.net", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")

    val fs= FileSystem.get(conf)
    val ostream = fs.create(new Path("/abfss_test.out"))
    val pwriter = new PrintWriter(ostream)
    try {
      pwriter.write("Azure Datalake Gen2 test")
      pwriter.write("\n")
    }
    finally {
      pwriter.close()
    }
//  check if the file we've just created exists
    println(fs.exists(new Path("/abfss_test.out")))

关于scala - 识别文件夹是否存在于 ADLS gen 2 帐户中的正确方法是什么，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60086978/

scala - 识别文件夹是否存在于 ADLS gen 2 帐户中的正确方法是什么

上一篇：vagrant - Boot2Docker 专用网络设置

下一篇：nginx - 无法启动openresty docker