java - word.set()方法在map reduce hadoop中抛出空指针异常

标签 java hadoop nullpointerexception mapreduce word-count

我是 map reduce 编程的新手,我的类(class)是从简单的字数统计示例开始的。但是,我正在尝试一种不同的方法。我的 hdfs 输入文件夹中有两个输入文件。我正在尝试生成类似

的输出
anyword1 --> filename1     2
anyword2 --> filename2     3

我编写了一个映射器类以在键处将单词和文件名连接在一起,但是当我在文本中设置键值时,它会抛出空指针异常。有人可以提供帮助并建议我哪里做错了吗?

我的映射器类

public static class TokenizerMapper 
       extends Mapper<Object, Text, Text,IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = null;
    private String fileText = null;

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
      String modifiedWord ="";
      fileName = "-->"+fileName;
      System.out.println("filename before word-->"+fileName);
      while (itr.hasMoreTokens()) {
        modifiedWord = itr.nextToken().toString();//+fileName;
        modifiedWord = modifiedWord + fileName;
        System.out.println("modified word-->"+modifiedWord);
        word.set(modifiedWord);
        context.write(word, one);
        System.out.println("Mapper context-->"+word);
      }
    }
  }

------ 异常 ----

[root@LinuxCentos7 hadoop]# hadoop jar /usr/local/mapreduceexample/WordCountEx3.jar /user/Siddharth/Input /user/Siddharth/output
17/06/09 23:32:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/09 23:32:32 INFO input.FileInputFormat: Total input paths to process : 2
17/06/09 23:32:32 INFO mapreduce.JobSubmitter: number of splits:2
17/06/09 23:32:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1497025644387_0011
17/06/09 23:32:33 INFO impl.YarnClientImpl: Submitted application application_1497025644387_0011
17/06/09 23:32:33 INFO mapreduce.Job: The url to track the job: http://LinuxCentos7:8088/proxy/application_1497025644387_0011/
17/06/09 23:32:33 INFO mapreduce.Job: Running job: job_1497025644387_0011
17/06/09 23:32:52 INFO mapreduce.Job: Job job_1497025644387_0011 running in uber mode : false
17/06/09 23:32:52 INFO mapreduce.Job:  map 0% reduce 0%
17/06/09 23:33:16 INFO mapreduce.Job:  map 100% reduce 0%
17/06/09 23:33:16 INFO mapreduce.Job: Task Id : attempt_1497025644387_0011_m_000000_0, Status : FAILED
Error: java.lang.NullPointerException
    at com.hadoop.WordCountEx3$TokenizerMapper.map(WordCountEx3.java:56)
    at com.hadoop.WordCountEx3$TokenizerMapper.map(WordCountEx3.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

最佳答案

使用 Text 实例初始化 word 变量:

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private String fileText = null;

public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {
    ...
}

关于java - word.set()方法在map reduce hadoop中抛出空指针异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44464442/

相关文章:

java - 异常处理vaadin

java - Android 客户端向 C 套接字服务器发送对象失败

json - 在PIG的JsonLoader中定义架构

hadoop - 按 pig 中的相同值对数据包进行分组

hadoop - Apache Spark : Apply existing mllib model on Incoming DStreams/DataFrames

java - ViewRootImpl.setPausedForTransition(boolean) NullPointerException 在 ActivityTransitionCoordinator 中转换到其他 Activity 调用过早

tomcat - 访问 session 属性时获取 NullPointerException

java - Java 中带有 HashMap 的摩尔斯电码翻译器

java - 运行 JUnit 测试时出现 NullPointerException,表示接口(interface)为空

java - 在 C/C++/Java 中查找 USB 设备的信息