处理具有复杂数据类型列(例如Array和Array)的表时,在Spark 2中获取CastClassException
我尝试过的 Action 很简单:算一下
df=spark.sql("select * from <tablename>")
df.count
但是在运行spark应用程序时低于错误
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$23.apply(HiveInspectors.scala:529)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:419)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:419)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:435)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
奇怪的是,
spark-shell
中数据框的相同操作正常运行该表具有以下复杂列:
|-- sku_product: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- sku_id: string (nullable = true)
| | |-- qty: string (nullable = true)
| | |-- price: string (nullable = true)
| | |-- display_name: string (nullable = true)
| | |-- sku_displ_clr_desc: string (nullable = true)
| | |-- sku_sz_desc: string (nullable = true)
| | |-- parent_product_id: string (nullable = true)
| | |-- delivery_mthd: string (nullable = true)
| | |-- pick_up_store_id: string (nullable = true)
| | |-- delivery: string (nullable = true)
|-- hitid_low: string (nullable = true)
|-- evar7: array (nullable = true)
| |-- element: string (containsNull = true)
|-- hitid_high: string (nullable = true)
|-- evar60: array (nullable = true)
| |-- element: string (containsNull = true)
让我知道是否需要进一步的信息。
最佳答案
我有一个类似的问题。我在 Parquet 文件中使用spark 2.1。
我发现其中一个 Parquet 文件与其他 Parquet 文件具有不同的架构。因此,当我尝试阅读全部内容时,出现转换错误。
为了解决它,我只是逐文件检查。
关于scala - 在Spark 2中获取CastClassException:java.lang.ClassCastException:java.util.ArrayList无法转换为org.apache.hadoop.io.Text,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47832704/