我正在使用 Apache Spark 和 Apache Kylin,我必须在 HDFS 中存储一个 csv 文件,以便能够在 Kylin 中使用它创建一个多维数据集。我的想法是将我获得的 RDD 转换为 csv 文件,我试图将我的 RDD 转换为 csv 文件,如下所示:
bookingDF.write().format("com.databricks.spark.csv").option("header", "true").save("hdfs://10.7.30.131:8020/tmp/hfile/e.csv");
但我总是像这样长时间收到错误,我认为这是因为我正在使用的对象的字段日期:
17/01/19 14:50:24 ERROR Utils: Aborting taskscala.MatchError: Fri Dec 09 07:45:27 CET 2016 (of class java.util.Date)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)...
在下面附上我正在使用的 Java 对象的代码:
@JsonIgnoreProperties(ignoreUnknown = true)
public class Booking implements Serializable {
private String bPk;
private String type;
private String transactionId;
private Boolean revisit;
private String device;
@JsonProperty("serverTime")
private Date time;
private String trackingId;
private String browserFamily;
@JsonProperty("action")
private String measure;
private String userId;
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getMeasure() {
return measure;
}
public void setMeasure(String measure) {
this.measure = measure;
}
public String getBrowserFamily() {
return browserFamily;
}
public void setBrowserFamily(String browserFamily) {
this.browserFamily = browserFamily;
}
public void setTime(Date time) {
this.time = time;
}
public String getTrackingId() {
return trackingId;
}
public void setTrackingId(String trackingId) {
this.trackingId = trackingId;
}
public Date getTime() {
return time;
}
....
我不确定我做错了什么,我尝试将 java.util.date 转换为 java.sql.date 但我仍然遇到相同的错误,但使用的是 java.sql.date。
最佳答案
您可以尝试使用 java.text.SimpleDateFormat
吗?
来自 spark-csv源代码:
日期格式:
specifies a string that indicates the date format to use when reading dates or timestamps.
Custom date formats follow the formats atjava.text.SimpleDateFormat
.
This applies to both DateType and TimestampType.
By default, it is null which means trying to parse times and date byjava.sql.Timestamp.valueOf()
andjava.sql.Date.valueOf()
.
关于java - RDD 到 CSV JAVA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41744795/