scala - Spark : scala - how to convert collection from RDD to another RDD

如何将调用 take(5) 后返回的集合转换为另一个 RDD，以便在输出文件中保存前 5 条记录？

如果我使用 saveAsTextfile 它不允许我一起使用 take 和 saveAsTextFile (这就是为什么你会看到下面注释的行) .它按排序顺序存储来自 RDD 的所有记录，因此前 5 个记录是前 5 个国家，但我只想存储前 5 个记录 - 是否可以在 RDD 中转换集合 [take(5)]？

val Strips =  txtFileLines.map(_.split(","))
                         .map(line => (line(0) + "," + (line(7).toInt + line(8).toInt)))
                         .sortBy(x => x.split(",")(1).trim().toInt, ascending=false)
                         .take(5)
                       //.saveAsTextFile("output\\country\\byStripsBar")

解决方案: sc.parallelize(Strips, 1).saveAsTextFile("output\\country\\byStripsBar")

最佳答案

val rowsArray: Array[Row] = rdd.take(5)
val slicedRdd = sparkContext.parallelize(rowsArray, 1)

slicedRdd.savesTextFile("specify path here")

关于scala - Spark : scala - how to convert collection from RDD to another RDD，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37780902/

上一篇：hadoop - 在 map & reduce 之后运行几行

下一篇：hadoop - hadoop中context.getconfiguration的含义

相关文章：

hadoop - 如何在hadoop中按值对字数进行排序？

hadoop - 获取 hadoop streaming jobid

sql - Hive查询使用 “not column=value” where子句删除空值

eclipse - Spark 应用程序在 Eclipse 中使用 Scala 和 SBT

scala - 如何在 play 框架和 build.sbt 中获取应用程序版本

scala - 具有逆变的隐式解析

apache-spark - 使用Value Spark Java API连接数据集中的列

java - 使用 py4j 将 Log4j 连接到 java/python 项目中的 ipython 笔记本 stderr

scala - scala 中总是调用 eq == 吗？

scala - 处理多个选项并记录未找到的情况