java - 获取错误值类 : class org. apache.hadoop.io.LongWritable 不是类 org.apache.hadoop.io.IntWritable

标签 java mapreduce hadoop2

我正在学习MapReduce,我编写了一个程序来计算成员(member)和非成员(member)预订的总持续时间。我通过了所有可能需要的作业配置,但是在运行 hadoop 命令时,它抛出错误的值类。我尝试在 stackoverflow 中搜索许多解决方案,但无法调试问题。 Map 的输出和Reducer 的输入是正确的。 有人可以帮我吗?

public class BixiMontrealAnalysis {

    public static class BixiMapper extends Mapper <LongWritable, Text, IntWritable, IntWritable> {
        public void map(LongWritable offset, Text line, Context context) throws IOException, InterruptedException {
            String csvAttributes[] = line.toString().split(",");
            int isMember = 0;
            int duration = 0;
            try {
                duration = Integer.parseInt(csvAttributes[4]);
                isMember = Integer.parseInt(csvAttributes[5]);
            } catch (Exception e) {
                System.out.println("Will Emit 0,0");
            }
            context.write(new IntWritable(isMember), new IntWritable(duration));
        }
    }

    public static class BixiReducer extends Reducer <IntWritable, IntWritable, IntWritable, LongWritable> {
        public void reduce(IntWritable isMember, Iterable <IntWritable> combinedDurationByIsMember, Context context) throws IOException, InterruptedException {
            long sum = 0L;
            for (IntWritable duration: combinedDurationByIsMember) {
                sum = sum + (long) duration.get();
            }
            context.write(isMember, new LongWritable(sum));
        }
    }

    public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "bix-montreal-job");
        job.setJarByClass(BixiMontrealAnalysis.class);
        job.setMapperClass(BixiMapper.class);

        job.setCombinerClass(BixiReducer.class);
        job.setReducerClass(BixiReducer.class);

        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(LongWritable.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

我期望输出为 K,V 为 0,持续时间之和 & 1,持续时间之和

CSV 内容

start_date,start_station_code,end_date,end_station_code,duration_sec,is_member
2019-07-01 00:00:03,6014,2019-07-01 00:04:26,6023,262,1
2019-07-01 00:00:07,6036,2019-07-01 00:34:54,6052,2087,0
2019-07-01 00:00:11,6018,2019-07-01 00:06:48,6148,396,1
2019-07-01 00:00:12,6202,2019-07-01 00:17:25,6280,1032,1
2019-07-01 00:00:15,6018,2019-07-01 00:06:57,6148,401,0
2019-07-01 00:00:20,6248,2019-07-01 00:15:40,6113,920,1
2019-07-01 00:00:37,6268,2019-07-01 00:15:00,6195,862,0

下面是堆栈跟踪

Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.IntWritable
    at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:194)
    at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1374)
    at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1691)
    at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
    at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
    at com.onboarding.hadoop.BixiMontrealAnalysis$BixiReducer.reduce(BixiMontrealAnalysis.java:43)
    at com.onboarding.hadoop.BixiMontrealAnalysis$BixiReducer.reduce(BixiMontrealAnalysis.java:37)

最佳答案

job.setCombinerClass(BixiReducer.class);

我已将 Combiner 类设置为与 Reducer 相同,但引用标准 WordCount 问题,这不应该如此。我研究了Combiner,发现Combiner类的使用是为了产生中间记录,因此Reducer的负载较小。

关于java - 获取错误值类 : class org. apache.hadoop.io.LongWritable 不是类 org.apache.hadoop.io.IntWritable,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58806046/

相关文章:

java - 没有全局变量的多重返回

hadoop - Hbase - 通过列名前缀获取行的列名

java - 警报对话框监听器方法内为空,空指针

java - 类似的 while 循环和 do while 循环之间的输出差异?

java - 如何在 jsf selectOneMenu 中访问选定的标签和选定的描述,其中 selectOneMenu 与 javax.faces.model.SelectItem 对象绑定(bind)?

java - windows下hadoop启动tasktracker的问题

hadoop - map 阶段不读取中间结果

hadoop - YARN JobHistory 错误 : Failed redirect for container

hadoop - Hive 根据文件名创建表分区

Hadoop 高可用性不工作