我正在尝试根据日期差异计算列。是否有可用于列/数据框的 datediff
相应函数?铁。
Column new = old.col("one").divide(old.col("max").minus(old.col("min")));
但在这种情况下,minus
函数不起作用,因为 min
和 max
列包含日期。所以我需要像 datediff 这样的东西来表示Column
。有这样的事吗?
谢谢!
最佳答案
有,它被称为datediff
(org.apache.spark.sql.functions.datediff
):
public static Column datediff(Column end,
Column start)
Returns the number of days from start to end.
Parameters:
end - (undocumented)
start - (undocumented)
Returns:
(undocumented)
Since:
1.5.0
示例:
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.SQLContext;
import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.DataFrame;
public class App {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext= new SQLContext(sc);
DataFrame df = sqlContext.sql(
"SELECT CAST('2012-01-01' AS DATE), CAST('2013-08-02' AS DATE)").toDF("first", "second");
df.select(datediff(df.col("first"), df.col("second"))).show();
}
}
关于java - Apache Spark - 数据帧的 datediff?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38311425/