我有以下案例类:
case class OrderDetails(OrderID : String, ProductID : String, UnitPrice : Double,
Qty : Int, Discount : Double)
我正在尝试阅读此 csv:https://github.com/xsankar/fdps-v3/blob/master/data/NW-Order-Details.csv
这是我的代码:
val spark = SparkSession.builder.master(sparkMaster).appName(sparkAppName).getOrCreate()
import spark.implicits._
val orderDetails = spark.read.option("header","true").csv( inputFiles + "NW-Order-Details.csv").as[OrderDetails]
错误是:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot up cast `UnitPrice` from string to double as it may truncate
The type path of the target object is:
- field (class: "scala.Double", name: "UnitPrice")
- root class: "es.own3dh2so4.OrderDetails"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
如果所有字段都是“ double ”值,为什么不能转换?我不明白什么?
Spark 2.1.0 版,Scala 2.11.7 版
最佳答案
您只需要明确地将您的字段转换为 Double
:
val orderDetails = spark.read
.option("header","true")
.csv( inputFiles + "NW-Order-Details.csv")
.withColumn("unitPrice", 'UnitPrice.cast(DoubleType))
.as[OrderDetails]
附带说明一下,根据 Scala(和 Java)约定,您的 case 类构造函数参数应该是小驼峰式:
case class OrderDetails(orderID: String,
productID: String,
unitPrice: Double,
qty: Int,
discount: Double)
关于scala - 显式转换读取 .csv 与案例类 Spark 2.1.0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43169409/