我有一个数据框,其列名称包含点
。
示例:df.printSchema
user.id_number
user.name.last
user.phone.mobile
etc,我想通过用 _
替换 dot
来重命名架构。
user_id_number
user_name_last
user_phone_mobile
注意:此 DF 的输入数据为 JSON 格式(与 NoSQL
等非关系型数据)
最佳答案
使用 .map,.withColumnRenamed
将 .
替换为 _
示例:
val df=Seq(("1","2","3")).toDF("user.id_number","user.name.last","user.phone.mobile")
df.toDF(df.columns.map(x =>x.replace(".","_")):_*).show()
//using replaceAll
df.toDF(df.columns.map(x =>x.replaceAll("\\.","_")):_*).show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//| 1| 2| 3|
//+--------------+--------------+-----------------+
2。使用 selectExpr:
val expr=df.columns.map(x =>col(s"`${x}`").alias(s"${x}".replace(".","_")).toString)
df.selectExpr(expr:_*).show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//| 1| 2| 3|
//+--------------+--------------+-----------------+
3.使用.withColumnRenamed:
df.columns.foldLeft(df){(tmpdf,col) =>tmpdf.withColumnRenamed(col,col.replace(".","_"))}.show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//| 1| 2| 3|
//+--------------+--------------+-----------------+
关于scala - Spark更改DF架构列从点重命名为下划线,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62744361/