scala - 折叠 Action 在 Spark 中是如何工作的？

下面我有一个 Spark 的 Scala 示例 fold行动:

val rdd1 = sc.parallelize(List(1,2,3,4,5), 3)
rdd1.fold(5)(_ + _)

这会产生输出 35 .有人可以详细解释这个输出是如何计算的吗？

最佳答案

取自 Scaladocs here (强调我的):

@param zeroValue the initial value for the accumulated result of each partition for the op operator, and also the initial value for the combine results from different partitions for the op operator - this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)

zeroValue在您的情况下添加了四次(每个分区一个，在组合分区结果时加一个)。所以结果是:

(5 + 1) + (5 + 2 + 3) + (5 + 4 + 5) + 5 // (extra one for combining results)

关于scala - 折叠 Action 在 Spark 中是如何工作的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48358428/

上一篇：Ansible 获取 list 中定义的主机名

下一篇：visual-studio-mac - Visual Studio for Mac : how to run a terminal

apache-spark - 如何将多行json文件作为rdd获取到单条记录

Java 折叠数组列表

scala - 使用 Akka HTTP 和 circe 处理可空字段的 PATCH 请求

scala - 我可以使用 SELECT from dataframe 而不是创建此临时表吗？

scala - 类似于 Python 字典的适当 Scala 集合

scala - 使用scala.util.control.Exception

java - 在spark java api(org.apache.spark.SparkException)中使用filter()，map()，...时出错

树上的haskell折叠操作

scheme - 使用抽象列表函数的列表幂集