我必须将 5 个数据帧合并为一个数据帧。数据框看起来像,
+-------------------+---------------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+---------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field1 with beats|
+-------------------+---------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+------------------------------------------------------------------------+
|2020-03-04 23:10:59| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field2 with kobo |
+-------------------+------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field3 with beats|
+-------------------+------------------------------------------------------------------------+
+-------------------+-------------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+-------------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added an field4 with beats|
+-------------------+-------------------------------------------------------------------+
+-------------------+---------------------------------------------------------------+
|Timestamp |sentence |
+-------------------+---------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added a field5 with beats|
+-------------------+---------------------------------------------------------------+
当联合应用于前 3 个数据帧时,显示工作正常,但在包括最后两个时, Spark 作业没有进展。
做我使用的联合,
dfs = [df1, df2, df3, df4, df5]
df_final = reduce(lambda a, b: a.union(b), dfs)
df_final.show()
我想显示结果,但工作卡在
showString at NativeMethodAccessorImpl.java:0
我该如何解决这个问题?
最佳答案
对我来说看起来不错,因为您对 union
具有相同的数据类型以及 unionByName
的相同列名
我认为这不是 union
的问题或 unionByName
可能还有其他问题。从调度程序的角度来看,可能是资源紧缩。查看是否有其他工作在并行运行。
关于python - Spark 作业未结束 : Show of dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61537189/