我目前正在编写一个简单的程序来从 hive DB 中提取数据,我需要使用 jodatime 中的日期格式...我有这样的东西:
import org.joda.time._
import org.joda.convert._
import scala.tools._
import org.joda.time.format.DateTimeFormat._
object DateExtract {
// change depending on which segment you with wish to catupure, ie weekly, monthly etc.
def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )}
}
}
但是,当我将数据拉入 RDD 时,我从该函数调用中收到错误,表明它返回的对象不可序列化。
Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$DateExtract$
最佳答案
所以我想通了......基本上要么像这样扩展对象:
object DateExtract extends java.io.Serializable {
// change depending on which segment you with wish to catupure, ie weekly, monthly etc.
def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )}
}
}
或者简单地只定义函数:
def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )}
}
由于 Spark 本身会序列化函数,而显然不是对象……有趣的是,Jodatime 应该扩展序列化,但在 Spark scala 生态中,它必须隐式声明。
关于scala - Jodatime Scala 和序列化日期时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25107028/