我根据 Spark 快速入门指南执行了用 Java 编写的简单代码:
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]");
JavaSparkContext sc = new JavaSparkContext(conf);
Accumulator<Integer> counter = sc.accumulator(0);
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data);
rdd.foreach(counter::add);
System.out.println("Counter value " + counter);
}
它按预期打印 "Counter value 15"
。
我有用 Scala 编写的具有相同逻辑的代码:
object Counter extends App {
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]")
val sc = new SparkContext(conf)
val counter = sc.accumulator(0)
val data = Array(1, 2, 3, 4, 5)
val rdd = sc.parallelize(data)
rdd.foreach(x => counter += x)
println(s"Counter value: $counter")
}
但它每次都打印不正确的结果 (<15)。我的 Scala 代码有什么问题?
Java spark lib "org.apache.spark:spark-core_2.10:1.6.1"
Scala spark lib "org.apache.spark" %% "spark-core" % "1.6.1"
最佳答案
quick-start 中的建议文档说:
Note that applications should define a main() method instead of extending scala.App. Subclasses of scala.App may not work correctly.
也许这就是问题所在?
尝试:
object Counter {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]")
val sc = new SparkContext(conf)
val counter = sc.accumulator(0)
val data = Array(1, 2, 3, 4, 5)
val rdd = sc.parallelize(data)
rdd.foreach(x => counter += x)
println(s"Counter value: $counter")
}
}
关于java - 相同的 Scala 和 Java spark 函数产生不同的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36503480/