excel - spark-excel 数据类型问题

标签 excel apache-spark apache-spark-sql apache-poi spark-excel

我正在使用 spark-excel使用 spark 2.2 处理 ms excel 文件的包。某些文件无法作为 spark 数据帧加载,但出现以下异常。如果有人遇到此问题,您能否帮助解决此类数据类型问题?

经过分析,我发现如果列名不是字符串,它最终会给出以下异常,如果我手动将列名从整数更改为字符串,它工作正常。

代码:

  val excelDF = spark.read.
    format("com.crealytics.spark.excel").
    option("useHeader", "true").
    option("treatEmptyValuesAsNulls", "true").
    option("inferSchema", "true").
    option("addColorColumns", "False").
    option("sheetName", sheetName).
    load(filePath)

异常(exception):
java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell
    at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1077)
    at org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:395)
    at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:347)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply(ExcelRelation.scala:206)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply(ExcelRelation.scala:205)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:205)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:204)
    at scala.Option.getOrElse(Option.scala:121)
    at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:204)
    at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:91)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:37)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:14)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)

最佳答案

新版com.crealytics:spark-excel_2.11:0.12.5库也适用于非字符串列/标题名称。

关于excel - spark-excel 数据类型问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48301217/

相关文章:

java - 从 Java Servlet 返回 Excel 文件以及 HTML 内容

python - 在只读模式下使用 OpenPyXL 获取 Excel 工作表的列名

ubuntu - Apache Spark : "failed to launch org.apache.spark.deploy.worker.Worker" or Master

java - 线程 “main” java.lang.NoClassDefFoundError中的异常:org/apache/hadoop/fs/StreamCapabilities。没有版本问题

apache-spark - Spark : DataFrame Aggregation (Scala)

apache-spark - 如何将 spark-shell 连接到 Mesos?

xml - 从 OpenXml Excel 文件中读取日期

python - 修复 csv 文件一列中多余逗号的最简单方法

scala - Spark : split only one column in dataframe and keep remaining columns as it is

apache-spark - Spark中用scala计算总体百分比