我有一个数据框
+-----+----+----------+------------+----------+------------+
|empId| lId| date1| time1 | date2 | time2 |
+-----+----+----------+------------+----------+------------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|
+-----+----+----------+------------+----------+------------+
需要找到每个 empId 的 2 个日期时间(以分钟为单位)之间的时间差并另存为新列。 所需的输出:
+-----+----+----------+------------+----------+------------+---------+
|empId| lId| date1| time1 | date2 | time2 |TimeDiff |
+-----+----+----------+------------+----------+------------+---------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|60 |
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|120 |
+-----+----+----------+------------+----------+------------+---------+
最佳答案
您可以concat
date
和time
并将其转换为timestamp
并找到差异
分钟如下
import org.apache.spark.sql.functions._
val format = "yyyy-MM-dd HH:mm:ss.SSS" //datetime format after concat
val newDF = df1.withColumn("TimeDiffInMinute",
abs(unix_timestamp(concat_ws(" ", $"date1", $"time1"), format).cast("long")
- (unix_timestamp(concat_ws(" ", $"date2", $"time2"), format)).cast("long") / 60D
)
unix_timestamp
将 datetime
转换为 timestamp
,减去 timestamp
结果为 seconds
并在 分钟
内除以 60 结果。
输出:
+-----+----+----------+------------+----------+------------+---------+
|empId| lId| date1| time1| date2| time2|dateTime1|
+-----+----+----------+------------+----------+------------+---------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077| 60.0|
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077| 120.0|
+-----+----+----------+------------+----------+------------+---------+
希望这对您有所帮助!
关于scala - 如何在 Scala 中找到两个日期时间之间的时差?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50350380/