hadoop - 在Oozie中定义Avro key 的架构

标签 hadoop mapreduce oozie avro

我是 map 还原和Avro的新手。我的项目基本上只有一个mapper函数,该函数接受Text数据并输出Avro数据,为此,我声明了我的mapper,例如:
public class AvroMapper extends Mapper(LongWritable, Text, AvroKey<CharSequence>, NullWritable)
我在为Oozie工作流程中的键设置架构时遇到麻烦。我的Oozie文件配置为:

<property>
    <name>mapred.output.key.class</name>
    <value>org.apache.avro.mapred.NullWriatable</value>
</property>
<property>
    <name>mapred.mapoutput.key.class</name>
    <value>org.apache.avro.mapred.AvroKey</value>
</property>
<property>
    <name>mapred.mapoutput.value.class</name>
    <value>org.apache.avro.mapred.NullWritable</value>
</property>
<property>
<name>mapred.output.key.comparator.class</name>
<value>org.apache.avro.mapred.AvroKeyComparator</value>
</property>
<property>
     <name>avro.schema.output.key</name>
     <value>{my JSON schema}</value>
</property>
<property>
 <name>mapreduce.inputformat.class</name>
 <value>org.apache.hadoop.mapreduce.lib.input.TextInputFormat</value>
 </property>
 <property>
   <name>mapreduce.outputformat.class</name>
       <value>org.apache.avro.mapreduce.AvroKeyOutputFormat</value>
  </property>

但它仍然抛出:
java.lang.NullPointerException
at org.apache.avro.mapred.Pair.getKeySchema(Pair.java:68)
at org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:818)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:836)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.ha...

请指导我哪里错了..

最佳答案

请改用AvroMapperAvroReducer类。这样对我来说更容易。请记住在这种情况下使用Pair类和架构。

无论如何,Avro的Oozie配置并不简单。为了节省您的时间,这是我对AvroMapper和AvroReducer的配置:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
    <property>
        <name>avro.input.schema</name>
        <value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[... your fields ...]}</value>
    </property>
    <property>
        <name>avro.output.schema</name>
        <value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[... your fields ...]}</value>
    </property>
    <property>
        <name>avro.mapper</name>
        <value>your.mapper.class.Name</value>
    </property>
    <property>
        <name>avro.reducer</name>
        <value>your.reducer.class.Name</value>
    </property>
    <property>
        <name>mapred.output.key.comparator.class</name>
        <value>org.apache.avro.mapred.AvroKeyComparator</value>
    </property>
    <property>
        <name>mapred.reducer.class</name>
        <value>org.apache.avro.mapred.HadoopReducer</value>
    </property>
    <property>
        <name>mapred.output.format.class</name>
        <value>org.apache.avro.mapred.AvroOutputFormat</value>
    </property>
    <property>
        <name>mapred.mapper.class</name>
        <value>org.apache.avro.mapred.HadoopMapper</value>
    </property>
    <property>
        <name>mapred.input.format.class</name>
        <value>org.apache.avro.mapred.AvroInputFormat</value>
    </property>
    <property>
        <name>mapred.output.key.class</name>
        <value>org.apache.avro.mapred.AvroWrapper</value>
    </property>
    <property>
        <name>mapred.mapoutput.value.class</name>
        <value>org.apache.avro.mapred.AvroValue</value>
    </property>
    <property>
        <name>io.serializations</name>
        <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.avro.mapred.AvroSerialization</value>
    </property>
    <property>
        <name>mapred.mapoutput.key.class</name>
        <value>org.apache.avro.mapred.AvroKey</value>
    </property>
</configuration>

关于hadoop - 在Oozie中定义Avro key 的架构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21741627/

相关文章:

hadoop - 如何使用oozie在日期不确定的HDFS中查找数据路径

apache-spark - 为什么我的spark应用程序在群集模式下失败,但是在客户端模式下成功?

json - HBase从具有行ID的任意JSON插入

java - Hadoop Reducer - 在新 API 中获取输入目录?

java - 如何访问mapreduce中扩展reducer的静态内部类中的静态变量?

java - pig 前端在Oozie上设置自定义类路径

hadoop - Oozie 协调员。如何将过去的数据提供给 mapreduce 作业?

java - 如何在 pdf 和 doc 文件上运行 Hadoop wordcount 程序?

hadoop - 在独立模式下执行hadoop示例时,权限被拒绝

apache-spark - 当Spark从oozie调用Hive时,异常引发 “java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException”