apache-spark - 如何控制 Spark History Server 的内存堆大小?

标签 apache-spark cloudera-cdh

我们在带有 CDH 5.3.2 的 YARN 和 Spark History Server 上运行 Spark (1.2)。

对于小型作业历史服务器能够工作,但对于少数大型作业 Spark History Server 无法检索日志/作业历史。并显示以下错误

  2015-04-09 09:50:48,061 WARN     org.eclipse.jetty.servlet.ServletHandler: Error for /history/application_1428034115331_31584
org.spark-project.guava.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Java heap space
at org.spark-project.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
at org.spark-project.guava.common.cache.LocalCache.get(LocalCache.java:4000)
at org.spark-project.guava.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at org.spark-project.guava.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:85)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.json4s.MonadicJValue$$anonfun$org$json4s$MonadicJValue$$findDirectByName$1.apply(MonadicJValue.scala:26)
at org.json4s.MonadicJValue$$anonfun$org$json4s$MonadicJValue$$findDirectByName$1.apply(MonadicJValue.scala:22)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.json4s.MonadicJValue.org$json4s$MonadicJValue$$findDirectByName(MonadicJValue.scala:22)
at org.json4s.MonadicJValue.$bslash(MonadicJValue.scala:16)
at org.apache.spark.util.JsonProtocol$.taskInfoFromJson(JsonProtocol.scala:560)
at org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:465)
at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:425)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2$$anonfun$apply$1.apply(ReplayListenerBus.scala:71)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2$$anonfun$apply$1.apply(ReplayListenerBus.scala:69)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2.apply(ReplayListenerBus.scala:69)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2.apply(ReplayListenerBus.scala:55)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$getAppUI$1.apply(FsHistoryProvider.scala:128)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$getAppUI$1.apply(FsHistoryProvider.scala:117)
at scala.Option.map(Option.scala:145)
at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:117)
at org.apache.spark.deploy.history.HistoryServer$$anon$3.load(HistoryServer.scala:55)
at org.apache.spark.deploy.history.HistoryServer$$anon$3.load(HistoryServer.scala:53)
at org.spark-project.guava.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.spark-project.guava.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at org.spark-project.guava.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)

我没有找到任何方法来为 Spark 历史服务器设置堆?
还是这与 YARN 历史服务器有关?

谢谢

最佳答案

在 spark-env.sh 中设置 SPARK_DAEMON_MEMORY=2g 对我有帮助

关于apache-spark - 如何控制 Spark History Server 的内存堆大小?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29543854/

相关文章:

python - PySpark ML : OnevsRest strategy for LinearSVC

hadoop - 在 Yarn 集群上运行时 Spark 批处理未完成

hadoop - 在 ubuntu 12.04 LTS 中使用 cloudera manager 安装 Cloudera CDH5

apache-spark - 在 hive 或 impala 中计算表统计数据如何加速 Spark SQL 中的查询?

scala - Scala中如何根据三列过滤数据

apache-spark - hadoop.dll,winutils.exe和在hadoop上构建的多个应用程序

java - Sqoop导出更新失败,索引:2处缺少IN或OUT参数

hadoop - Oozie 协调器如何对底层工作流更改使用react?

hadoop - hdfs中的权限组(默认情况下)如何工作?为什么所有用户文件都属于超组?

json - Flume - Solr 集成