dataframe - 如何在执行 spark dataframe.write().insertInto ("table"时确保正确的列顺序)？

标签 dataframe apache-spark databricks azure-databricks

我正在使用以下代码将数据帧数据直接插入到数据 block 增量表中:

eventDataFrame.write.format("delta").mode("append").option("inferSchema","true").insertInto("some delta table"))

但是如果创建 detla 表的列顺序与数据框列顺序不同，值会变得困惑，然后不会写入正确的列。如何维持秩序？是否有执行此操作的标准方法/最佳实践？

最佳答案

这很简单-

####in pyspark 

df= spark.read.table("TARGET_TABLE")  ### table in which  we need to insert finally 

df_increment ## the data frame which has random column order which we want to insert into TARGET_TABLE
df_increment =df_increment.select(df.columns)
df_increment.write.insertInto("TARGET_TABLE")

所以对你来说它会

parent_df=   spark.read.table("some delta table") 
eventDataFrame.select(parent_df.columns).write.format("delta").mode("append").option("inferSchema","true").insertInto("some delta table"))

关于dataframe - 如何在执行 spark dataframe.write().insertInto ("table"时确保正确的列顺序)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58656660/

上一篇：Azure 部署卡住

下一篇：TypeScript 编译器 API : find which file a type came from

相关文章：

java - Spark 读取 .7z 文件

Azure Databricks 和表单识别器 - 图像无效或受密码保护

hadoop - 我在哪里可以找到 spark.hadoop.yarn.* 属性？

r - 如何从 R 中大小为 N 的数据帧中获取大小为 n 的所有可能子样本？

python - 在 Python 中计算滚动总和

python - 如何仅注释堆积条形图的一个类别

python - 由 IndexedRowMatrix().columnSimilarities() 检索的 PySpark 相似性无法访问 : INFO ExternalSorter: Thread * spilling in-memory map

scala - 我们能否使用多个 Spark session 来访问两个不同的 Hive 服务器

databricks - 下载文件(数据 block /驱动程序)

Python/ Pandas : writing multiple Dataframes to Excel sheets using a "for-loop"