scala - Spark更改DF架构列从点重命名为下划线

我有一个数据框，其列名称包含点。示例:df.printSchema

user.id_number
user.name.last
user.phone.mobile

etc，我想通过用 _ 替换 dot 来重命名架构。

user_id_number
user_name_last
user_phone_mobile

注意:此 DF 的输入数据为 JSON 格式(与 NoSQL 等非关系型数据)

最佳答案

使用 .map,.withColumnRenamed 将 . 替换为 _

示例:

val df=Seq(("1","2","3")).toDF("user.id_number","user.name.last","user.phone.mobile")
df.toDF(df.columns.map(x =>x.replace(".","_")):_*).show()

//using replaceAll
df.toDF(df.columns.map(x =>x.replaceAll("\\.","_")):_*).show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//|             1|             2|                3|
//+--------------+--------------+-----------------+

2。使用 selectExpr:

val expr=df.columns.map(x =>col(s"`${x}`").alias(s"${x}".replace(".","_")).toString)

df.selectExpr(expr:_*).show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//|             1|             2|                3|
//+--------------+--------------+-----------------+

3.使用.withColumnRenamed:

df.columns.foldLeft(df){(tmpdf,col) =>tmpdf.withColumnRenamed(col,col.replace(".","_"))}.show()
//+--------------+--------------+-----------------+
//|user_id_number|user_name_last|user_phone_mobile|
//+--------------+--------------+-----------------+
//|             1|             2|                3|
//+--------------+--------------+-----------------+

关于scala - Spark更改DF架构列从点重命名为下划线，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62744361/

scala - Spark更改DF架构列从点重命名为下划线

上一篇：polygon - 为 folium 中的多边形着色

下一篇：xml - 如何在 XPath 中查找达到一定深度的后代元素？