我正在尝试调试 PySpark 程序,坦率地说,我被难住了。
我在日志中看到以下错误。我验证了输入参数 - 一切似乎都按顺序进行。
驱动程序和执行程序似乎是正确的 - 每个节点上使用了大约 7GB 中的 3MB。 我看到创建的 DAG 计划非常庞大。会不会是这个原因?
17/02/18 00:59:02 ERROR Utils:在线程 SparkListenerBus 中抛出未捕获的 fatal error
java.lang.OutOfMemoryError:Java 堆空间
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:103)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:134)
at org.apache.spark.scheduler.EventLoggingListener.onOtherEvent(EventLoggingListener.scala:202)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:67)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
线程“SparkListenerBus”中出现异常java.lang.OutOfMemoryError:Java堆空间
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:103)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:134)
at org.apache.spark.scheduler.EventLoggingListener.onOtherEvent(EventLoggingListener.scala:202)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:67)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
最佳答案
此错误的解决方法是使用以下设置:
spark.eventLog.enabled=false
但这意味着您没有收到任何事件日志。
关于java - 堆空间错误: SparkListenerBus,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48837010/