我有一个指向 s3 位置(parquet 文件)的外部表,它的所有数据类型都是字符串。我想更正所有列的数据类型,而不是将所有内容都作为字符串读取。当我删除外部表并使用新数据类型重新创建时,选择查询总是抛出如下所示的错误:
java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48)
at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:233)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
最佳答案
指定类型为 BigInt 等同于 long 类型,hive 没有 long 数据类型。
hive> alter table table change col col bigint;
Duplicate content, from Hortonworks forum
关于sql - 更改 Parquet 文件中的列数据类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49587501/