scala - Spark : Create temporary table by executing sql query on temporary tables

我正在使用 Spark，我想知道:如何通过对表 A 和 B 执行 sql 查询来创建名为 C 的临时表？

sqlContext
   .read.json(file_name_A)
   .createOrReplaceTempView("A")

sqlContext
   .read.json(file_name_B)
   .createOrReplaceTempView("B")

val tableQuery = "(SELECT A.id, B.name FROM A INNER JOIN B ON A.id = B.fk_id) C"

sqlContext.read
   .format(SQLUtils.FORMAT_JDBC)
   .options(SQLUtils.CONFIG())
   .option("dbtable", tableQuery)
   .load()

最佳答案

您需要将结果保存为临时表

tableQuery .createOrReplaceTempView("dbtable")

外部表上的永久存储，您可以使用 JDBC

val prop = new java.util.Properties
prop.setProperty("driver", "com.mysql.jdbc.Driver")
prop.setProperty("user", "vaquar")
prop.setProperty("password", "khan") 
 
//jdbc mysql url - destination database is named "temp"
val url = "jdbc:mysql://localhost:3306/temp"
 
//destination database table 
val dbtable = "sample_data_table"
 
//write data from spark dataframe to database
df.write.mode("append").jdbc(url, dbtable, prop)

https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
http://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

关于scala - Spark : Create temporary table by executing sql query on temporary tables，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50949806/

上一篇：python-3.x - 通过更改 pandas 中的组内的列值来创建组

下一篇：google-cloud-platform - 具有Terraform的Google Cloud凭据

相关文章：

scala - 如何优雅地终止 Spark 应用程序

scala - 这两个高阶函数有何不同？

scala - 删除 scala 中 int 对向量中的重复映射

git - Jenkins:如何从一个 git 存储库构建多个顶级项目？

linux - 是否可以让 jenkins 访问只有 root 或某些特定程序才能访问的文件？

jenkins - 在多分支管道中禁用分支并通过扫描事件重新打开

java - JTable 中的 JProgressBar 未更新

scala - Apache Toree 和 Spark Scala 在 Jupyter 中不起作用

apache-spark - 如何使用spark数据帧API按最大值(日期)进行选择

apache-spark - 如何在 Spark 中访问此类数据