scala - 在 Spark 作业中写入 HBase : a conundrum with existential types

我正在尝试编写一个应将其输出放入 HBase 的 Spark 作业。据我所知，正确的方法是使用 saveAsHadoopDataset 方法。在 org.apache.spark.rdd.PairRDDFunctions - 这需要我的 RDD由对组成。

方法saveAsHadoopDataset需要 JobConf ，这就是我要构建的。根据this link ，我必须在我的 JobConf 上设置一件事是输出格式(实际上没有它就不能工作)，比如

jobConfig.setOutputFormat(classOf[TableOutputFormat])

问题是显然这不能编译，因为TableOutputFormat是通用的，即使它忽略了它的类型参数。所以我尝试了各种组合，比如

jobConfig.setOutputFormat(classOf[TableOutputFormat[Unit]])
jobConfig.setOutputFormat(classOf[TableOutputFormat[_]])

但无论如何我都会得到一个错误

required: Class[_ <: org.apache.hadoop.mapred.OutputFormat[_, _]]

现在，据我所知，Class[_ <: org.apache.hadoop.mapred.OutputFormat[_, _]]转换为 Class[T] forSome { type T <: org.apache.hadoop.mapred.OutputFormat[_, _] } .这是我认为有问题的地方，因为:

Class是不变的
TableOutputFormat[T] <: OutputFormat[T, Mutation] , 但是
我不确定存在类型如何与需求中的子类型交互 T <: OutputFormat[_, _]

有没有办法获得OutputFormat[_, _]的子类型？来自 TableOutputFormat ？问题似乎出自 Java 和 Scala 中泛型之间的差异 - 我能为此做些什么？

编辑:

事实证明这更微妙。我试图在 REPL 中为自己定义一个方法

def foo(x: Class[_ <: OutputFormat[_, _]]) = x

我实际上可以调用它

foo(classOf[TableOutputFormat[Unit]])

甚至

foo(classOf[TableOutputFormat[_]])

就此而言。但是我不能打电话

jobConf.setOutputFormat(classOf[TableOutputFormat[_]])

setOutputFormat的原始签名在 Java 中是 void setOutputFormat(Class<? extends OutputFormat> theClass) .我如何从 Scala 中调用它？

最佳答案

这很奇怪，您是否 100% 确定您的导入是正确的(编辑:是的，这是问题，请参阅评论)，并且您的构建文件中的工件版本正确吗？如果我提供我的工作项目中的代码片段，也许它会对您有所帮助:

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.hbase.mapred.TableOutputFormat

val conf = HBaseConfiguration.create()

val jobConfig: JobConf = new JobConf(conf, this.getClass)
jobConfig.setOutputFormat(classOf[TableOutputFormat])
jobConfig.set(TableOutputFormat.OUTPUT_TABLE, outputTable)

还有我的一些部门:

"org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
"org.apache.hbase" % "hbase-client" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-common" % "0.96.1.1-cdh5.0.0", 

"org.apache.hbase" % "hbase-hadoop-compat" % "0.96.1.1-cdh5.0.0",
"org.apache.hbase" % "hbase-it" % "0.96.1.1-cdh5.0.0", /
"org.apache.hbase" % "hbase-hadoop2-compat" % "0.96.1.1-cdh5.0.0",

"org.apache.hbase" % "hbase-prefix-tree" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-protocol" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-server" % "0.96.1.1-cdh5.0.0",
"org.apache.hbase" % "hbase-shell" % "0.96.1.1-cdh5.0.0", 

"org.apache.hbase" % "hbase-testing-util" % "0.96.1.1-cdh5.0.0", 
"org.apache.hbase" % "hbase-thrift" % "0.96.1.1-cdh5.0.0",

关于scala - 在 Spark 作业中写入 HBase : a conundrum with existential types，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23625896/

scala - 在 Spark 作业中写入 HBase : a conundrum with existential types

上一篇：hadoop - Hive sql 如何使用多个 COUNT 函数进行查询并使用它们进行划分方法

下一篇：hadoop - reducer 和 mapper 可以在同一个数据节点上吗？