java - Hadoop Reducer代码parseint命令错误

标签 java hadoop mapreduce bigdata

我是一个初学者,还在学习hadoop。
我已经尝试解决一个问题已经有好几天了,但是我在化简工作中遇到了错误。我的代码可以通过Mapper函数正常运行,但不能与reducer一起运行。任何指导将不胜感激。
代码很简单,并且关于组合键的值。
输入为:
按键F1 F2总计
键1 12 1 5
键1 23 1 5
键1 34 1 5
键1 23 1 5
键1 2 1 5
键1 12 1 5
键1 12 1 5
键1 4 1 5
键2 35 2 5
键2 456 2 5
键2 57 2 5
键2 67 2 5
键2 8 2 5
键2 8 2 5
键2 78 2 5
键2 78 2 5
键3 1 3 5
键3 1 3 5
键3 1 3 5
键3 1 3 5

所需的输出是:
键sum(f1)sum(f2)sum(f3)avg(f1)
键1 122 8 40 15.25
key 2 787 16 40 98.375
键3 4 12 20 1

我知道这是一个非常简单的代码,但是我被困在某个地方,并且很长时间以来都无法解决此问题。
我的代码是:
映射器:

    import java.io.IOException;
//import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class AMapper extends Mapper<LongWritable, Text, Text, Text> {
  @Override
  public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {
      String a = "";
      String abc="";
        String[] line1 = value.toString().split(",");
        for(int j=1;j<line1.length;j++)
          { if(j==1)
            {
                abc = abc.concat(line1[j]);
            }
          else
          abc = abc.concat("#").concat(line1[j]);           
          }
        a = line1[0];       
        context.write(new Text(a), new Text(abc));
    }
  }

reducer :
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class AReducer extends Reducer<Text, Text, Text, Text> {

  public void reduce(Text key, Iterable<Text> values, Context context)
      throws IOException, InterruptedException {                 
                  int f[] = new int[3];  
                  f[0]=f[1]=f[2]=0;
      String xyz="";
      for (Text val : values)
      { 
      String[] line1 = val.toString().split("#");
      f[0] = Integer.parseInt(line1[0]) + f[0];
      f[1] = Integer.parseInt(line1[1]) + f[1];
      f[2] = Integer.parseInt(line1[2]) + f[2];
       }     
                xyz= f[0]+"\t"+f[1]+"\t"+f[2];
                context.write(key, new Text(xyz)); 
                f[0]=f[1]=f[2]=0;
    }            

}

主要:
    import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class Agger {

  public static void main(String[] args) throws Exception {

    if (args.length != 2) {
      System.out.printf("Usage: Agger <input dir> <output dir>\n");
      System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(Agger.class);
    job.setJobName("Aggregation");

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(AMapper.class);
    job.setReducerClass(AReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    boolean success = job.waitForCompletion(true);
    System.exit(success ? 0 : 1);
  }


}

这是代码。
这是我得到的错误
15/03/19 04:53:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/03/19 04:53:23 INFO input.FileInputFormat: Total input paths to process : 1
15/03/19 04:53:23 WARN snappy.LoadSnappy: Snappy native library is available
15/03/19 04:53:23 INFO snappy.LoadSnappy: Snappy native library loaded
15/03/19 04:53:23 INFO mapred.JobClient: Running job: job_201503190444_0003
15/03/19 04:53:24 INFO mapred.JobClient:  map 0% reduce 0%
15/03/19 04:53:29 INFO mapred.JobClient:  map 100% reduce 0%
15/03/19 04:53:33 INFO mapred.JobClient: Task Id : attempt_201503190444_0003_r_000000_0, Status : FAILED
java.lang.NumberFormatException: For input string: ""
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:470)
    at java.lang.Integer.parseInt(Integer.java:499)
    at AReducer.reduce(AReducer.java:17)
    at AReducer.reduce(AReducer.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

attempt_201503190444_0003_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201503190444_0003_r_000000_0: log4j:WARN Please initialize the log4j system properly.
attempt_201503190444_0003_r_000000_0: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
15/03/19 04:53:38 INFO mapred.JobClient: Task Id : attempt_201503190444_0003_r_000000_1, Status : FAILED
java.lang.NumberFormatException: For input string: ""
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:470)
    at java.lang.Integer.parseInt(Integer.java:499)
    at AReducer.reduce(AReducer.java:17)
    at AReducer.reduce(AReducer.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

attempt_201503190444_0003_r_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201503190444_0003_r_000000_1: log4j:WARN Please initialize the log4j system properly.
attempt_201503190444_0003_r_000000_1: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
15/03/19 04:53:42 INFO mapred.JobClient: Task Id : attempt_201503190444_0003_r_000000_2, Status : FAILED
java.lang.NumberFormatException: For input string: ""
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:470)
    at java.lang.Integer.parseInt(Integer.java:499)
    at AReducer.reduce(AReducer.java:17)
    at AReducer.reduce(AReducer.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

attempt_201503190444_0003_r_000000_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201503190444_0003_r_000000_2: log4j:WARN Please initialize the log4j system properly.
attempt_201503190444_0003_r_000000_2: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
15/03/19 04:53:50 INFO mapred.JobClient: Job complete: job_201503190444_0003
15/03/19 04:53:50 INFO mapred.JobClient: Counters: 29
15/03/19 04:53:50 INFO mapred.JobClient:   File System Counters
15/03/19 04:53:50 INFO mapred.JobClient:     FILE: Number of bytes read=0
15/03/19 04:53:50 INFO mapred.JobClient:     FILE: Number of bytes written=182376
15/03/19 04:53:50 INFO mapred.JobClient:     FILE: Number of read operations=0
15/03/19 04:53:50 INFO mapred.JobClient:     FILE: Number of large read operations=0
15/03/19 04:53:50 INFO mapred.JobClient:     FILE: Number of write operations=0
15/03/19 04:53:50 INFO mapred.JobClient:     HDFS: Number of bytes read=347
15/03/19 04:53:50 INFO mapred.JobClient:     HDFS: Number of bytes written=0
15/03/19 04:53:50 INFO mapred.JobClient:     HDFS: Number of read operations=2
15/03/19 04:53:50 INFO mapred.JobClient:     HDFS: Number of large read operations=0
15/03/19 04:53:50 INFO mapred.JobClient:     HDFS: Number of write operations=0
15/03/19 04:53:50 INFO mapred.JobClient:   Job Counters 
15/03/19 04:53:50 INFO mapred.JobClient:     Failed reduce tasks=1
15/03/19 04:53:50 INFO mapred.JobClient:     Launched map tasks=1
15/03/19 04:53:50 INFO mapred.JobClient:     Launched reduce tasks=4
15/03/19 04:53:50 INFO mapred.JobClient:     Data-local map tasks=1
15/03/19 04:53:50 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=4785
15/03/19 04:53:50 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=19188
15/03/19 04:53:50 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/19 04:53:50 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/19 04:53:50 INFO mapred.JobClient:   Map-Reduce Framework
15/03/19 04:53:50 INFO mapred.JobClient:     Map input records=20
15/03/19 04:53:50 INFO mapred.JobClient:     Map output records=20
15/03/19 04:53:50 INFO mapred.JobClient:     Map output bytes=253
15/03/19 04:53:50 INFO mapred.JobClient:     Input split bytes=94
15/03/19 04:53:50 INFO mapred.JobClient:     Combine input records=0
15/03/19 04:53:50 INFO mapred.JobClient:     Combine output records=0
15/03/19 04:53:50 INFO mapred.JobClient:     Spilled Records=20
15/03/19 04:53:50 INFO mapred.JobClient:     CPU time spent (ms)=400
15/03/19 04:53:50 INFO mapred.JobClient:     Physical memory (bytes) snapshot=150429696
15/03/19 04:53:50 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=387174400
15/03/19 04:53:50 INFO mapred.JobClient:     Total committed heap usage (bytes)=160501760

我正在尝试从映射器获取输出为{'key1',1#2#3,2#3#4,3#4#5}。
该输出将被馈送到 reducer , reducer 将拆分值并将其相加。但是某种程度上,代码无法正常工作。
任何帮助或指导将不胜感激。另外,如果有人可以告诉我一个好的站点来学习hadoop,只会对我有所帮助,所以请帮忙!

最佳答案

我认为您的程序没有错误。由于Java试图将空字符串“”转换为Integer,因此您将获得异常。通常,当数据文件(文本文件)中有多余的空间或换行时,就会出现此错误。

为避免此类错误,请在try catch中将映射端和化简器端逻辑括起来。

关于java - Hadoop Reducer代码parseint命令错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29140819/

相关文章:

hadoop - 想要使用 pig 聚合两个已解析 xml 文件的文件的值

hadoop - 学习Mapreduce:确实是 reducer 或映射器

java - 处理/Java ArrayList和LinkedList异常

java - nginx - 我可以在哪里放置 client_max_body_size 属性?

hadoop - 从绝对文件路径的逗号分隔列表配置单元加载数据

hadoop - 将整个 HDFS 从一个集群转移到另一个集群

google-app-engine - 使用 Google AppEngine MapReduce 处理完所有记录后,如何从计数器中获取值?

csv - 使用MapReduce将CSV批量加载到Hbase

java - Java中面板的目的是什么

java - Spring WebSocket : broadcasting messages to users