apache-spark - Spark OutOfMemoryError

标签 apache-spark

当我尝试提交一个向 kafka 发送消息的 spark 作业时,我遇到了 OOME - 它向 Kafka 发送消息(675 字节) - 该错误仅在执行程序即将关闭时显示。

Diagnostics: Failing this attempt. Failing the application.
  ApplicationMaster host: N/A
  ApplicationMaster RPC port: -1
  start time: 1441611385047
  final status: FAILED

这是 yarn 日志:


INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
WARN thread.QueuedThreadPool: 7 threads could not be stopped
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-12"
Exception in thread "Thread-3" 


Exception in thread "shuffle-client-4" Exception in thread "shuffle-server-7" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "shuffle-client-4"


INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
Exception in thread "LeaseRenewer:user@dom" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LeaseRenewer:user@dom"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.actor.default-dispatcher-16"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-6"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-5"
Exception in thread "Thread-3" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-3"

在极少数情况下它显示为 SUCCEEDED 但 YARN 日志仍然有 OOME:

INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped!
INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
INFO storage.MemoryStore: MemoryStore cleared
INFO storage.BlockManager: BlockManager stopped
INFO storage.BlockManagerMaster: BlockManagerMaster stopped
INFO spark.SparkContext: Successfully stopped SparkContext
INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
Exception in thread "Thread-3" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-3"


你试过像这样增加 MaxPermSize 吗?

enter image description here

关于apache-spark - Spark OutOfMemoryError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32433432/


scala - Apache Spark : RDD[Char] but should be RDD[String] as result of flatmap

apache-spark - 如何防止每次运行整个笔记本时运行 'pip install ...'?

apache-spark - 如何将 Great Expectations DataFrame 转换为 Apache Spark DataFrame

python - pyspark:数据框在另一个数据框的列中按ID选择行

python - PySpark 在映射 lambda 中序列化 'self' 引用对象?

apache-spark - Spark 2.4 中从 Amazon Redshift 读取数据

scala - 如何根据分配的优先级选择最重要的行?

apache-spark - Kubernetes 上的 Spark UI 历史服务器?

scala - Apache Spark : Getting a InstanceAlreadyExistsException when running the Kafka producer

arrays - 如何在Postgresql中插入具有列数组<array<double>>的数据框?