apache-spark - 驱动程序无故停止执行程序

标签 apache-spark spark-structured-streaming spark-streaming-kafka

我有一个基于带有kafka的spark结构化流3的应用程序,它正在处理一些用户日志,一段时间后驱动程序开始杀死执行程序,我不明白为什么。 执行器不包含任何错误。我将在下面留下执行程序和驱动程序的日志

在执行器1上:

0/08/31 10:01:31 INFO executor.Executor: Finished task 5.0 in stage 791.0 (TID 46411). 1759 bytes result sent to driver
20/08/31 10:01:33 INFO executor.YarnCoarseGrainedExecutorBackend: Driver commanded a shutdown

在执行器2上:

20/08/31 10:14:33 INFO executor.YarnCoarseGrainedExecutorBackend: Driver commanded a shutdown
20/08/31 10:14:34 INFO memory.MemoryStore: MemoryStore cleared
20/08/31 10:14:34 INFO storage.BlockManager: BlockManager stopped
20/08/31 10:14:34 INFO util.ShutdownHookManager: Shutdown hook called

在驱动程序上:

20/08/31 10:01:33 ERROR cluster.YarnScheduler: Lost executor 3 on xxx.xxx.xxx.xxx: Executor heartbeat timed out after 130392 ms

20/08/31 10:53:33 ERROR cluster.YarnScheduler: Lost executor 2 on xxx.xxx.xxx.xxx: Executor heartbeat timed out after 125773 ms
20/08/31 10:53:33 ERROR cluster.YarnScheduler: Ignoring update with state FINISHED for TID 129308 because its task set is gone (this is likely the result of receiving duplicate task finished status updates) or its executor has been marked as failed.
20/08/31 10:53:33 ERROR cluster.YarnScheduler: Ignoring update with state FINISHED for TID 129314 because its task set is gone (this is likely the result of receiving duplicate task finished status updates) or its executor has been marked as failed.
20/08/31 10:53:33 ERROR cluster.YarnScheduler: Ignoring update with state FINISHED for TID 129311 because its task set is gone (this is likely the result of receiving duplicate task finished status updates) or its executor has been marked as failed.
20/08/31 10:53:33 ERROR cluster.YarnScheduler: Ignoring update with state FINISHED for TID 129305 because its task set is gone (this is likely the result of receiving duplicate task finished status updates) or its executor has been marked as failed.

有人遇到同样的问题并解决了吗?

最佳答案

查看手头的可用信息:

  • 没有错误
  • 司机命令关闭
  • yarn 日志显示“state FINISHED”

这似乎是预期的行为。

如果您忘记等待 Spark 流查询的终止,通常会发生这种情况。如果您的代码不以

结束
query.awaitTermination()

处理完所有数据后,您的流应用程序将关闭。

关于apache-spark - 驱动程序无故停止执行程序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63668560/

相关文章:

python - pyspark RDD countByKey() 是如何计数的?

google-cloud-platform - 如何在流式查询中使用 Google Cloud Storage 作为检查点位置?

apache-spark - 使用 Python 中的 Spark Structured Streaming 从 Kafka 读取数据并打印到控制台

apache-spark - Apache Kafka 和 Spark 流

scala - Spark 流 + kafka - Spark session API

hadoop - 无法在 Spark-1.2.0 上部署工作人员

java - 了解 spark 中的随机播放

hadoop - Kafka Spark 流式传输 : unable to read messages

hadoop - YARN 阈值错误

apache-spark - 如何使用 from_json 允许消息具有不同的字段?