hadoop - 关于集群异常的 Hive AvroSerde

标签 hadoop hive bigdata hiveql

我有 AVRO 文件,我需要将该文件映射到 HIVE 表。最好的解决方案是使用 AvroSerDe。 所以我在集群上使用了下一个命令:

 - CREATE EXTERNAL TABLE my_db.new_table
    ROW FORMAT SERDE
    'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    TBLPROPERTIES (
    'avro.schema.url'='hdfs:///folder/mySchema.avsc');





- LOAD DATA inpath '/folder/myFile.avro' OVERWRITE INTO TABLE my_db.new_table;

所有这些命令都成功执行,但是当我尝试使用 hive 查询语言获取数据时,我在 Hadoop 映射任务上出现异常:

SELECT
user.name as u_name,
FROM my_db.new_table
LATERAL VIEW explode(users) user_table as user;

异常(exception):

2015-05-27 13:22:24,838 DEBUG [main] org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils: Failed to open file system for uri hdfs:///folder/mySchema.avsc assuming it is not a FileSystem url
java.io.IOException: Incomplete HDFS URI, no host: hdfs:///folder/mySchema.avsc
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:142)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:149)
    at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:110)
    at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.getSchema(AvroGenericRecordReader.java:112)
    at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:70)
    at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:298)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:259)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:386)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:652)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

hive 版本:0.14

出现这种异常的原因是什么?

谢谢!

最佳答案

问题出在

TBLPROPERTIES (
    'avro.schema.url'='hdfs:///folder/mySchema.avsc');

avro.schema.url 需要在 url 中包含 MASTER_NODE_NAME + 端口。 所以正确的版本是:

TBLPROPERTIES (
'avro.schema.url'='hdfs://MASTER_NODE_NAME:port/folder/mySchema.avsc');

关于hadoop - 关于集群异常的 Hive AvroSerde,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30483532/

相关文章:

hadoop - hive 选择列做一个案例陈述

java - Hadoop 在从 Spring 批处理管理员启 Action 业时获取连接被拒绝的异常

plsql - 将 PL/SQL 转换为 Hive QL

Hadoop Hive 查询将行合并为一行

java - 带有 Java 的 Apache Spark : Launching multiple app requests simultaneously

hadoop - 如何在超过3个级别的配置单元中加载嵌套集合

hadoop - 在sqoop中指定多个通用参数的正确方法是什么

java - Nutch Crawl 错误 - 输入路径不存在

database - Hadoop/Hive 查询将一列拆分为几列

mysql - sqoop Import 替换mysql的特殊字符