date - unix_timestamp 函数在 Hive 中将 2 位格式年份转换为 4 位格式年份的逻辑是什么?

标签 date hadoop hive

例如下面的hive脚本

select 
from_unixtime(unix_timestamp('30-Apr-50', 'dd-MMM-yy'), 'yyyy-MM-dd') as date1,
from_unixtime(unix_timestamp('30-Apr-45', 'dd-MMM-yy'), 'yyyy-MM-dd') as date2,
from_unixtime(unix_timestamp('30-Apr-35', 'dd-MMM-yy'), 'yyyy-MM-dd') as date3;

结果如下

date1       date2       date3
1950-04-30  1945-04-30  2035-04-30

将 2 位数年份转换为 4 位数年份的 unix_timestamp 函数背后的逻辑是什么? 2位数年份转换成20**有固定的阈值吗?如果有,阈值是多少?是否有我们可以根据某些条件设置世纪的参数('Century Break'作为真实情况)?

最佳答案

Year: ...
For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century.
It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created.
For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964.

SimpleDateFormat


hive> select current_date;
2017-03-28

-- 20 years after today

hive> select from_unixtime(unix_timestamp('37-03-28','yy-MM-dd'));
2037-03-28 00:00:00

hive> select from_unixtime(unix_timestamp('37-03-29','yy-MM-dd'));
1937-03-29 00:00:00

-- 80 years before today

hive> select from_unixtime(unix_timestamp('37-03-29','yy-MM-dd'));
1937-03-29 00:00:00

hive> select from_unixtime(unix_timestamp('37-03-28','yy-MM-dd'));
2037-03-28 00:00:00

代码走查

hive/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUnixTimeStamp.java

public class GenericUDFUnixTimeStamp extends GenericUDFToUnixTimeStamp {
...

public Object evaluate(DeferredObject[] arguments) throws HiveException {
    return (arguments.length == 0) ? currentTimestamp : super.evaluate(arguments);
  }

hive/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimeStamp.java

import java.text.SimpleDateFormat;
...

public class GenericUDFToUnixTimeStamp extends GenericUDF {
...
  private transient final SimpleDateFormat formatter = new SimpleDateFormat(lasPattern);
...
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
...
        retValue.set(formatter.parse(textVal).getTime() / 1000);
...
  }
}  

关于date - unix_timestamp 函数在 Hive 中将 2 位格式年份转换为 4 位格式年份的逻辑是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43084481/

相关文章:

java - hive 脚本问题

Javamail IMAP - 按日期搜索邮件返回错误结果

hadoop - 群集无法与cdh4 tarball安装一起使用

hadoop - JMX-SflowAgent 停止从 aspectj 检测的 WebSphere Application Server 收集 JVM 指标

hadoop - 使用Hadoop处理工资单的缺点

sql - 失败:执行错误,在访问Hive View 时从org.apache.hadoop.hive.ql.exec.mr.MapRedTask异常返回代码2

hadoop - 从Hive查询HBase表

Javascript 日期对象 - 月份误报?

MySql 将日期存储为 Date 或 BigInt (Epoch)

php - MySQL - 转换和比较查询中的日期