java - 无法在 hadoop 2.7 中运行 map reduce 作业 - 类型不匹配

在运行程序时出现 错误:java.io.IOException:映射中的键类型不匹配:预期的 org.apache.hadoop.io.Text，收到 org.apache.hadoop.io.LongWritable

我尝试了来自 google/stack 站点的更多建议。但没有运气。仍然有同样的异常(exception)。知道我错过了什么吗？

我的导入

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

我的 map 类

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> 
{
    Text k = new Text();


    public void map(Text key, Iterable<IntWritable> value, Context context) 
                throws IOException, InterruptedException {
        String line = value.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line," "); 
        while (tokenizer.hasMoreTokens()) { 
            String year= tokenizer.nextToken();
            k.set(year);
            String temp= tokenizer.nextToken().trim();
            int v = Integer.parseInt(temp); 
            context.write(k,new IntWritable(v)); 
        }
    }
}

还有我的reduce类

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>
{

    public void reduce (Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int maxtemp=0;
        for(IntWritable it : values) {
            int temperature= it.get();
            if(maxtemp<temperature)
            {
                maxtemp =temperature;
            }
        }
        context.write(key, new IntWritable(maxtemp)); 
    }
}

和主要

Configuration conf = new Configuration();

Job job = new Job(conf, "MaxTemp");
job.setJarByClass(MaxTemp.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

Path outputPath = new Path(args[1]);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

outputPath.getFileSystem(conf).delete(outputPath);

System.exit(job.waitForCompletion(true) ? 0 : 1);

(我在 Eclipse IDE (Mars) 中使用 Java 7 编译了这段代码 - 导出为可运行的 jar，Hadoop 版本为 2.7.0)

最佳答案

如果您添加 @Override对您的 map 的注释函数你会发现它不会覆盖 map Mapper 中的方法.

如果您查看 Mapper (link here) 的 Javadoc，您会发现 map方法应如下所示:

map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)

你的样子在哪里

map(Text key, Iterable<IntWritable> value, Context context)

所以你的应该是:

map(LongWritable key, Text value, Context context)

所以因为你实际上并没有覆盖基础 map在 Mapper 中上课，您的方法未使用 Mapper 中的方法调用它看起来像:

protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
    context.write((KEYOUT) key, (VALUEOUT) value);
}

这将接受 LongWritable和 Text并将它们写回(Identity Mapper)，这与 Text 不匹配和 IntWritable你已经告诉它他们应该是。

在您的驱动程序中，这些行:

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

应该更像是:

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

您需要使用您的实现而不是基类。

关于java - 无法在 hadoop 2.7 中运行 map reduce 作业 - 类型不匹配，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39393614/

java - 无法在 hadoop 2.7 中运行 map reduce 作业 - 类型不匹配

上一篇：hadoop - Flink 转换为 parquet 错误

下一篇：java - Apache Nutch 2.3.1 远程命令失败