我想将选定的列添加到尚不可用的 DataFrame。
val columns=List("Col1","Col2","Col3")
for(i<-columns)
if(!df.schema.fieldNames.contains(i)==true)
df.withColumn(i,lit(0))
When select column the data frame only old column are coming, new columns are not coming.
最佳答案
它更多地是关于如何在 Scala 中做到这一点而不是 Spark 并且是 foldLeft
的绝佳案例。 (我最喜欢的!)
// start with an empty DataFrame, but could be anything
val df = spark.emptyDataFrame
val columns = Seq("Col1", "Col2", "Col3")
val columnsAdded = columns.foldLeft(df) { case (d, c) =>
if (d.columns.contains(c)) {
// column exists; skip it
d
} else {
// column is not available so add it
d.withColumn(c, lit(0))
}
}
scala> columnsAdded.printSchema
root
|-- Col1: integer (nullable = false)
|-- Col2: integer (nullable = false)
|-- Col3: integer (nullable = false)
关于scala - 如何在缺少名称时将新列添加到 DataFrame 中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43468515/