java - 由于 hive 查询错误导致 hadoop 作业出错

标签 java hadoop hive amazon-emr

异常(exception):

2017-06-21 22:47:49,993 FATAL ExecMapper (main): org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable org.apache.hadoop.dynamodb.DynamoDBItemWritable@2e17578f
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:643)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Exception while processing record: org.apache.hadoop.dynamodb.DynamoDBItemWritable@2e17578f
    at org.apache.hadoop.hive.dynamodb.DynamoDBObjectInspector.getColumnData(DynamoDBObjectInspector.java:136)
    at org.apache.hadoop.hive.dynamodb.DynamoDBObjectInspector.getStructFieldData(DynamoDBObjectInspector.java:97)
    at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:328)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:626)
    ... 9 more
Caused by: java.lang.NumberFormatException: For input string: "17664956244983174066"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Long.parseLong(Long.java:444)
    at java.lang.Long.parseLong(Long.java:483)
    at org.apache.hadoop.hive.dynamodb.DynamoDBDataParser.getNumberObject(DynamoDBDataParser.java:179)
    at org.apache.hadoop.hive.dynamodb.type.HiveDynamoDBNumberType.getHiveData(HiveDynamoDBNumberType.java:28)
    at org.apache.hadoop.hive.dynamodb.DynamoDBObjectInspector.getColumnData(DynamoDBObjectInspector.java:128)
    ... 12 more

我发送的配置单元查询是:

INSERT OVERWRITE TABLE temp_1 
         SELECT * FROM temp_2 
         WHERE t_id="17664956244983174066" and t_ts="636214684577250000000";

这个数字是否太大而无法转换为 int?我什至尝试发送不带引号的 17664956244983174066,但我遇到了同样的异常。

t_idhive table 中定义为BIGINT,在dynamobd 中定义为N 或Number

编辑:

我尝试将 t_id 定义为 string ==> Schema mismatch as dynamodb stores this as int

t_id as double ==>> 精度丢失。不匹配。

这里有什么解决方案?

最佳答案

Is this number too big to be converted to int?

是的,这个数字太大了,无法转换为整型。根据关于 Numeric Types 的 Apache Hive 文档,BIGINT 的最大值为 9223372036854775807。您的输入 17664956244983174066 大于该值。

以下是一个普通的 Hive 查询(没有 DynamoDB 集成),演示了尝试将各种输入转换为 BIGINT 的效果。

SELECT
    "9223372036854775807" AS str,
    cast("9223372036854775807" AS BIGINT) AS numbigint,
    cast("9223372036854775807" AS DOUBLE) AS numdouble
UNION ALL
SELECT
    "9223372036854775808" AS str,
    cast("9223372036854775808" AS BIGINT) AS numbigint,
    cast("9223372036854775808" AS DOUBLE) AS numdouble
UNION ALL
SELECT
    "17664956244983174066" AS str,
    cast("17664956244983174066" AS BIGINT) AS numbigint,
    cast("17664956244983174066" AS DOUBLE) AS numdouble
;

    str numbigint   numdouble
0   9223372036854775807 9223372036854775807 9.2233720368547758e+18
1   9223372036854775808 NULL    9.2233720368547758e+18
2   17664956244983174066    NULL    1.7664956244983173e+19

在记录的 BIGINT 最大值处,该值正确转换。只高 1,转换失败,导致 NULL。您的输入也会发生同样的事情。

查询还表明转换为 DOUBLE 是成功的。也许这是一个解决方案,具体取决于您的用例。与整数数据类型相比,这会带来遇到浮点精度问题的风险。

从您的堆栈跟踪来看,对于这种情况,DynamoDB 集成似乎导致 NumberFormatException 而不是 NULL。这可以说是 DynamoDB 连接器中的一个错误,但即使将其更改为映射到 NULL,您仍然无法成功转换。

关于java - 由于 hive 查询错误导致 hadoop 作业出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44687726/

相关文章:

java - Web 服务器与 servlet 容器

java - 如何使用 JSch 的 SCP 支持复制服务器上的文件?

hadoop - 如何在 Pig 中过滤时间戳

sql - hive 计数和计数不同不正确

hive - Hive 表中丢失小数位

java - 放心 : How do I return JSON response as a String? (Java)

java - xming中全屏模式无法打开框架(swing)

hadoop - Java 映射减少 : how to store a list of LONGs in Hadoop Output

mysql - 具有计算的动态最小最大值的 SQL 查询

apache-spark - 在 spark 2.2.0 中查询 Hive 表