scala - 对 Apache-Spark 数据帧中的距离求和

标签 scala apache-spark spark-dataframe graphframes

以下代码给出了一个每列中具有三个值的数据框，如下所示。

import org.graphframes._
    import org.apache.spark.sql.DataFrame
    val v = sqlContext.createDataFrame(List(
      ("1", "Al"),
      ("2", "B"),
      ("3", "C"),
      ("4", "D"),
      ("5", "E")
    )).toDF("id", "name")

    val e = sqlContext.createDataFrame(List(
      ("1", "3", 5),
      ("1", "2", 8),
      ("2", "3", 6),
      ("2", "4", 7),
      ("2", "1", 8),
      ("3", "1", 5),
      ("3", "2", 6),
      ("4", "2", 7),
      ("4", "5", 8),
      ("5", "4", 8)
    )).toDF("src", "dst", "property")
val g = GraphFrame(v, e)
val paths: DataFrame = g.bfs.fromExpr("id = '1'").toExpr("id = '5'").run()
paths.show()
val df=paths
df.select(df.columns.filter(_.startsWith("e")).map(df(_)) : _*).show

以上代码的输出如下:

    +-------+-------+-------+                                                       
    |     e0|     e1|     e2|
    +-------+-------+-------+
    |[1,2,8]|[2,4,7]|[4,5,8]|
    +-------+-------+-------+

在上面的输出中，我们可以看到每列都有三个值，它们可以解释如下。

e0 : 
source 1, Destination 2 and distance 8  

e1:
source 2, Destination 4 and distance 7

e2:
source 4, Destination 5 and distance 8

基本上e0 , e1 , 和 e3是边缘。我想对每列的第三个元素求和，即添加每条边的距离以获得总距离。我怎样才能做到这一点？

最佳答案

可以这样做:

val total = df.columns.filter(_.startsWith("e"))
 .map(c => col(s"$c.property")) // or col(c).getItem("property")
 .reduce(_ + _)

df.withColumn("total", total)

关于scala - 对 Apache-Spark 数据帧中的距离求和，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41043598/

上一篇：具有两种不同规范的 Gnuplot 自定义图例

下一篇： Jenkins 在失败的阶段继续管道

scala - 如何在 REPL 中使用具有默认(包)或私有(private)访问级别的成员？

scala - 这可以做成尾递归吗？

Scala Dataframe空检查列

scala - 如何隐式转换为单位？

apache-spark - Spark 中的 Dataframe 连接可以保留顺序吗？

java - 获取 Spark 数据集中嵌套数组的最小值

apache-spark - 将大型机IMS数据导入Hadoop

scala - 数据帧错误: "overloaded method value filter with alternatives"

scala - 使用 Scala 将字符串连接到 Spark 数据框中列表的每个元素