java - 无法在 hadoop 上运行字数统计

标签 java hadoop mapreduce

我尝试在 eclipse 中运行 hadoop 字数统计。我只是将 hadoop 目录和 hadoop/lib 目录中的所有 jar 文件添加到该项目的库中,但出现以下错误:

java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:23)
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:400)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
at     
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
2013-10-23 18:59:20,841 INFO  [main] mapreduce.Job   
(Job.java:monitorAndPrintJob(1288))   Job job_local_0001 running in uber mode : false
2013-10-23 18:59:20,843 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1295)) 
 map 0% reduce 0%
2013-10-23 18:59:20,847 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1308)) 
Job job_local_0001 failed with state FAILED due to: NA
2013-10-23 18:59:20,866 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1313)) 
 Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:891)
at org.orzota.bookx.mappers.MyHadoopDriver.main(MyHadoopDriver.java:46)

你能帮我解决这个问题吗?

MyhadoopMapper 是:

package org.orzota.bookx.mappers;
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class MyHadoopMapper extends MapReduceBase implements Mapper <LongWritable,     
Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable _key, Text value, OutputCollector<Text, IntWritable> 
  output, Reporter reporter) throws IOException {
    String st = value.toString();
    String[] bookdata = st.split(",");
    //for (int i=0; i< bookdata.length; i++){
        //System.out.println(bookdata[i]);
    //}
    //if (bookdata.length!=8){
        //System.out.println("Warning, bad Entry.." + bookdata.length);
        //return;
    //}
    output.collect(new Text(bookdata[1]), one);
}

}

最佳答案

看来错误是在下面这行:

output.collect(new Text(bookdata[1]), one);

因此,对于您遇到的异常(exception)情况,可以进行以下解释:

  • 您的输入文件中有没有 , 的行。
  • 您的输入文件中有空行。

这相当于数组 bookdata 分别有一个元素或没有元素,因此导致 ArrayIndexOutOfBoundsException

关于java - 无法在 hadoop 上运行字数统计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19547416/

相关文章:

java - 如何启动带有附加图像的电子邮件 Intent ?

windows - 什么是kerberos?

mongodb map reduce教程

java - 如何关闭servlet中的弹出窗口并将servlet重定向到父窗口

java.lang.ClassCastException : java. util.Vector 无法转换为 org.ksoap2.serialization.SoapPrimitive

java - 如何在Hadoop中以编程方式获取每个 reduce task 的执行时间?

hadoop - 级联函数在单个线程中作为hadoop映射器函数执行吗?

java - 面临合并洗牌和排序 Mapreduce 的问题

java - Java "Pass by reference"是否不适用于集合中存在的元素

hadoop - Parquet vs ORC vs ORC with Snappy