java - shuffle内存池空闲: SPARK with Java

标签 java apache-spark

我想通过像这样的键加入两个列表(NoHeaderIndexed 和 NoFirstIndexed):

    final  Broadcast<JavaPairRDD<Long, Tuple2<String, String>>> c = ctx.broadcast(noHeaderIndexed);
    JavaPairRDD<Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>> rs = noFirstIndexed.mapToPair(new PairFunction<Tuple2<Long, Tuple2<String, String>>, Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>>() {
        @Override
        public Tuple2<Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>> call(Tuple2<Long, Tuple2<String, String>> longTuple2Tuple2) throws Exception {
            String s1 = "";
            if (c.value().lookup(longTuple2Tuple2._1).get(0)._1 != null)
                s1 = c.value().lookup(longTuple2Tuple2._1).get(0)._1;

            String s2 = "";
            if (c.value().lookup(longTuple2Tuple2._1).get(0)._2 != null)
                s2 = c.value().lookup(longTuple2Tuple2._1).get(0)._2;
            return new Tuple2<Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>>(new Tuple2<Tuple2<String, String>, Long>(new Tuple2<String, String>(longTuple2Tuple2._2._1,longTuple2Tuple2._2._2),longTuple2Tuple2._1),new Tuple2<Tuple2<String, String>, Long>(new Tuple2<String, String>(s1,s2),longTuple2Tuple2._1 ) );
        }
    });

    //writeResult(rs, "rs.txt");
    rs.coalesce(1,true).saveAsTextFile(path+ "rs");    

但是当我尝试执行它时,它显示:

INFO ShuffleMemoryManager: Thread 61 waiting for at least 1/2N of shuffle memory pool to be free    

并且它不会终止执行。您能否向我解释一下这个问题以及如何解决它。

提前谢谢您。

最佳答案

在此命令中

rs.coalesce(1,true).saveAsTextFile(path+ "rs");

您只创建一个分区,以便所有数据都将到达一个节点。您需要增加分区数量

根据您的数据大小尝试此操作

rs.coalesce(10,true).saveAsTextFile(path+ "rs");

关于java - shuffle内存池空闲: SPARK with Java,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26757945/

相关文章:

java - 对 Java 8 流进行分区

java - Spark 1.6 java.io.IOException : Filesystem closed

apache-spark - spark 提交应用程序中的 Scala ScriptEngine 问题

java - 如何禁用 h :commandButton without preventing the action and actionListener from being called?

java - 如何通过多级extends来解析一个泛型?

java - 内存游戏随机颜色

apache-spark - 撤消缩放数据 pyspark

java - 如何避免连续出现 "Resetting offset"和 "Seeking to LATEST offset"?

hadoop - 如何在 apache zeppelin 中使用 hdfs shell 命令?

java - Selenium 无法点击 google SERP 中的链接