java - Hadoop java.lang.ArrayIndexOutOfBoundsException:3

标签 java hadoop

输入是房屋数据的列表,其中每个输入记录都包含有关单个房屋的信息:
(地址,城市,州, zip ,值)。记录中的五个项目是
以符号逗号(,)分隔。输出应该是每个邮政编码中的平均房屋值(value)。以下是我当前的代码:

public class ziphousevalue1 {

    public static class ZipHouseValueMapper extends Mapper < LongWritable, Text, Text, IntWritable > {
        private static final Text zip = new Text();
        private static final IntWritable value = new IntWritable();

        protected void map(LongWritable offset, Text line, Context context) throws IOException, InterruptedException {
            String[] tokens = value.toString().split(",");
            zip.set(tokens[3]);
            value.set(Integer.parseInt(tokens[4]));
            context.write(new Text(zip), value);
        }
    }

    public static class ZipHouseValueReducer extends Reducer < Text, IntWritable, Text, DoubleWritable > {

        private DoubleWritable average = new DoubleWritable();

        protected void reduce(Text zip, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException {
            int count = 0;
            int sum = 0;
            for (IntWritable value: values) {
                sum += value.get();
                count++;
            }
            average.set(sum / count);
            context.write(zip, average);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: ziphousevalue <in> <out>");
            System.exit(2);
        }
        Job job = new Job(conf, "ziphousevalue");
        job.setJarByClass(ziphousevalue1.class);
        job.setMapperClass(ZipHouseValueMapper.class);
        job.setReducerClass(ZipHouseValueReducer.class);

        job.setNumReduceTasks(3);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        configure(conf);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

    public static void configure(Configuration conf) {
        System.out.println("Test+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");

    }
}

但是,它会产生以下错误。我在这个网站上看过类似的问题,似乎没有一个可以解决问题。我确保输入文件正确。我还有其他应检查以解决此错误的问题吗?感谢您的时间。
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at ziphousevalue1$ZipHouseValueMapper.map(ziphousevalue1.java:29)
at ziphousevalue1$ZipHouseValueMapper.map(ziphousevalue1.java:24)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/11/11 22:10:42 INFO mapreduce.Job: Job job_local112498506_0001 running in uber mode : false
15/11/11 22:10:42 INFO mapreduce.Job:  map 0% reduce 0%
15/11/11 22:10:42 INFO mapreduce.Job: Job job_local112498506_0001 failed with state FAILED due to: NA
15/11/11 22:10:42 INFO mapreduce.Job: Counters: 0

最佳答案

ZipHouseValueMapper.map中,您具有:

String[] tokens = value.toString().split(",");
zip.set(tokens[3]);
value.set(Integer.parseInt(tokens[4]));

这意味着value必须具有至少5个逗号分隔的序列,但是value是新创建的IntWritable。转换为String时,是否至少具有5个逗号分隔的序列?似乎不太可能。您可能想用line做一些事情。

关于java - Hadoop java.lang.ArrayIndexOutOfBoundsException:3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33664464/

相关文章:

java - 如何从 AutoCAD DXF 文件获取详细信息到 Java 应用程序?

java - OpenESB 电子邮件 BC SMTP 端口配置

java - 如何静态分析jar包(map-reduce应用程序)并提取一些影响执行时间的功能?

apache - 在 JSON 中获取 Hadoop 作业跟踪器指标

hadoop - 是否可以在Ambari上安装Apache Bigtop Stack

java - 最大化和最小化 JInternalFrame

Java多线程向数据库插入百万条记录

java - 你如何决定定义一个变量 "private"?

hadoop - 缓慢变化的维度 - Hive 中的 SCD1 和 SCD2 实现

hadoop - 在 MRJob 中连接 HIVE