val partitionsColumns = "idnum,monthnum"
val partitionsColumnsList = partitionsColumns.split(",").toList
val loc = "/data/omega/published/invoice"
val df = sqlContext.read.parquet(loc)
val windowFunction = Window.partitionBy (partitionsColumnsList:_*).orderBy(df("effective_date").desc)
<console>:38: error: overloaded method value partitionBy with alternatives: (cols: org.apache.spark.sql.Column*) org.apache.spark.sql.expressions.WindowSpec <and> (colName: String,colNames: String*)org.apache.spark.sql.expressions.WindowSpec cannot be applied to (String) val windowFunction = Window.partitionBy(partitionsColumnsList:_*).orderBy(df("effective_date").desc)
是否可以将列列表发送到 partitionBy
方法 Spark/Scala?
我已经实现了将一列传递给 partitionBy
方法,该方法有效。我不知道如何将多个列传递给 partitionBy
方法
基本上我想将 List(Columns)
传递给 partitionBy
方法
Spark 版本为 1.6。
最佳答案
Window.partitionBy
具有以下定义:
static WindowSpec partitionBy(Column... cols)
Creates a WindowSpec with the partitioning defined.
static WindowSpec partitionBy(scala.collection.Seq<Column> cols)
Creates a WindowSpec with the partitioning defined.
static WindowSpec partitionBy(String colName, scala.collection.Seq<String> colNames)
Creates a WindowSpec with the partitioning defined.
static WindowSpec partitionBy(String colName, String... colNames)
Creates a WindowSpec with the partitioning defined.
以你的例子,
val partitionsColumnsList = partitionsColumns.split(",").toList
你可以像这样使用它:
Window.partitionBy(partitionsColumnsList.map(col(_)):_*).orderBy(df("effective_date").desc)
或者
Window.partitionBy(partitionsColumnsList.head, partitionsColumnsList.tail _* ).orderBy(df("effective_date").desc)
关于apache-spark - 如何在 Spark scala 的窗口 PartitionBy 中应用多个列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56626446/