apache-spark - 在单元测试中使用 TestHiveContext/HiveContext

标签 apache-spark hive apache-spark-sql hivecontext

我试图在单元测试中做到这一点:

val sConf = new SparkConf()
  .setAppName("RandomAppName")
  .setMaster("local")
val sc = new SparkContext(sConf)
val sqlContext = new TestHiveContext(sc)  // tried new HiveContext(sc) as well

但我明白了:
[scalatest] Exception encountered when invoking run on a nested suite - java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient *** ABORTED ***
[scalatest]   java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
[scalatest]   at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
[scalatest]   at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:120)
[scalatest]   at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:163)
[scalatest]   at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:161)
[scalatest]   at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:168)
[scalatest]   at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:72)
[scalatest]   at mypackage.NewHiveTest.beforeAll(NewHiveTest.scala:48)
[scalatest]   at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
[scalatest]   at mypackage.NewHiveTest.beforeAll(NewHiveTest.scala:35)
[scalatest]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
[scalatest]   at mypackage.NewHiveTest.run(NewHiveTest.scala:35)
[scalatest]   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1491)

当我使用 spark-submit 运行时,代码运行得很好,但在单元测试中却没有。
我如何为单元测试解决这个问题?

最佳答案

这是一个老问题,但我遇到了类似的问题,我最终使用了 spark-testing-base :

import com.holdenkarau.spark.testing.SharedSparkContext
import org.apache.spark.sql.hive.test.TestHiveContext
import org.scalatest.FunSuite

class RowToProtoMapper$Test extends FunSuite with SharedSparkContext {

    test("route mapping") {
        val hc = new TestHiveContext(sc)
        /* Some test */
    }
}

关于apache-spark - 在单元测试中使用 TestHiveContext/HiveContext,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34224636/

相关文章:

java - 如何在Java中使用sqoop将表从Oracle导入到Hive?

apache-spark - 根据另一列的不同值对列进行计数 pyspark

azure - 无法从azure数据 block 访问cassandra

scala - 将 Scalaz 与 Spark 结合使用时出现不可序列化异常

hadoop - FIWARE-Cosmos 头节点的 SSH 访问

elasticsearch - hive 的不重复计数与 Elasticsearch 的基数不匹配

sql - 查找给定范围内的最大序列 - Spark/Scala

python - 同一查询的 Spark sql 版本不起作用,而普通的 sql 查询起作用

hadoop - Hive 比 Spark 快吗?

scala - Spark数据帧将列值更改为时间戳