hadoop - 无法在Spark Streaming作业中获得广播_1的广播_1_piece0

标签 hadoop apache-spark spark-streaming bigdata

我正在群集模式下在 yarn 上运行 Spark 作业。作业从kafka直接流获取消息。我使用广播变量并每30秒检查一次。当我第一次开始工作时,它运行正常,没有任何问题。如果我终止工作并重新启动,则在收到来自kafka的消息后,执行程序将抛出以下异常:

java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1178)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at net.juniper.spark.stream.LogDataStreamProcessor$2.call(LogDataStreamProcessor.java:177)
    at net.juniper.spark.stream.LogDataStreamProcessor$2.call(LogDataStreamProcessor.java:1)
    at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:172)
    at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:172)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1298)
    at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1298)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

有谁知道如何解决此错误?

Spark版本:1.5.0

CDH 5.5.1

最佳答案

当遇到只有第一次运行才起作用的问题时,总是会导致围绕检查点数据的问题。此外,仅在需要检查的情况下才使用检查点,这是来自kafka的第一条消息。
我建议您检查作业是否确实已死,也就是说,该进程可能仍在执行该作业的计算机上运行。
尝试运行一个简单的ps -fe,看看是否仍在运行。如果有两个进程尝试使用相同的检查点文件夹,它将始终失败。
希望这可以帮助

关于hadoop - 无法在Spark Streaming作业中获得广播_1的广播_1_piece0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36404726/

相关文章:

hadoop - 使用SPARK数据框的方法 “saveAsTable”时权限被拒绝

hadoop - 语义异常 [错误 10007] : Ambiguous column reference _c1

hadoop - oozie sqoop操作无法导入

scala - 快速获取数据框中的记录数

apache-spark - 为什么Spark shuffle要将中间数据存储在磁盘上?

apache-spark - Apache Kafka 和 Spark 流

scala - 将 Spark Streaming RDD 推送到 Neo4j -Scala

hadoop - hadoop中使用哪种JBOD?和COD与hadoop?

Java 序列化 vs Hadoop 序列化 vs Spark 序列化

hadoop - yarn : Automatic clearing of filecache & usercache