apache-spark - 将数据框结果插入配置单元表时出现 Spark 异常

标签 apache-spark pyspark apache-spark-sql

这是我的代码片段。执行 spar.sql(query) 时出现以下异常。

我的 table_v2262 列。我的 table_v39 列

有人可以遇到类似的问题并帮助解决这个问题吗?时间差

spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sc=spark.sparkContext

df1 = spark.sql("select * from myDB.table_v2")
df2 = spark.sql("select * from myDB.table_v3")

result_df = df1.join(df2, (df1.id_c == df2.id_c) & (df1.cycle_r == df2.cycle_r) & (df1.consumer_r == df2.consumer_r))
final_result_df = result_df.select(df1["*"])

final_result_df.distinct().createOrReplaceTempView("results")
query = "INSERT INTO TABLE myDB.table_v2_final select * from results"
spark.sql(query);

我尝试在 conf 中设置参数,但没有帮助解决问题:

spark.sql.debug.maxToStringFields=500

错误:

20/12/16 19:28:20 ERROR FileFormatWriter: Job job_20201216192707_0002 aborted.
20/12/16 19:28:20 ERROR Executor: Exception in task 90.0 in stage 2.0 (TID 225)
org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Missing required char ':' at 'struct<>
    at org.apache.orc.TypeDescription.requireChar(TypeDescription.java:293)
    at org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:326)
    at org.apache.orc.TypeDescription.parseType(TypeDescription.java:385)
    at org.apache.orc.TypeDescription.fromString(TypeDescription.java:406)
    at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.org$apache$spark$sql$execution$datasources$orc$OrcSerializer$$createOrcValue(OrcSerializer.scala:226)
    at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.<init>(OrcSerializer.scala:36)
    at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:36)
    at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:108)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
    ... 8 more

最佳答案

我删除了我的 myDB.table_v2_final 并修改了我的代码中的以下行,它起作用了。

我怀疑我创建表格的方式可能存在一些问题。

query = "create external table myDB.table_v2_final as select * from results"

关于apache-spark - 将数据框结果插入配置单元表时出现 Spark 异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65344527/

相关文章:

python - 使用 udf 以编程方式从数据框中选择列

java - apache Spark Streaming kafka集成错误JAVA

apache-spark - 如何从pyspark word2vec模型中获取单词列表?

apache-spark - 使用正则表达式检查多列中是否有任何大于零的列

apache-spark - 由于 Kerberos : Caused by GSSException: No valid credentials provided .,spark-submit 无法连接到 Metastore,但在本地客户端模式下工作

python - 对上一行的两个值求和

amazon-ec2 - 如何充分利用集群中所有Spark节点?

python - Apache Spark 中的高效字符串匹配

python - 将 PySpark 数据帧写入 MongoDB 插入字段作为 ObjectId

scala - 如何在 Spark 2.3.0 中进行自连接?正确的语法是什么?