我是一个初学者,还在学习hadoop。
我已经尝试解决一个问题已经有好几天了,但是我在化简工作中遇到了错误。我的代码可以通过Mapper函数正常运行,但不能与reducer一起运行。任何指导将不胜感激。
代码很简单,并且关于组合键的值。
输入为:
按键F1 F2总计
键1 12 1 5
键1 23 1 5
键1 34 1 5
键1 23 1 5
键1 2 1 5
键1 12 1 5
键1 12 1 5
键1 4 1 5
键2 35 2 5
键2 456 2 5
键2 57 2 5
键2 67 2 5
键2 8 2 5
键2 8 2 5
键2 78 2 5
键2 78 2 5
键3 1 3 5
键3 1 3 5
键3 1 3 5
键3 1 3 5
所需的输出是:
键sum(f1)sum(f2)sum(f3)avg(f1)
键1 122 8 40 15.25
key 2 787 16 40 98.375
键3 4 12 20 1
我知道这是一个非常简单的代码,但是我被困在某个地方,并且很长时间以来都无法解决此问题。
我的代码是:
映射器:
import java.io.IOException;
//import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class AMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String a = "";
String abc="";
String[] line1 = value.toString().split(",");
for(int j=1;j<line1.length;j++)
{ if(j==1)
{
abc = abc.concat(line1[j]);
}
else
abc = abc.concat("#").concat(line1[j]);
}
a = line1[0];
context.write(new Text(a), new Text(abc));
}
}
reducer :
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class AReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int f[] = new int[3];
f[0]=f[1]=f[2]=0;
String xyz="";
for (Text val : values)
{
String[] line1 = val.toString().split("#");
f[0] = Integer.parseInt(line1[0]) + f[0];
f[1] = Integer.parseInt(line1[1]) + f[1];
f[2] = Integer.parseInt(line1[2]) + f[2];
}
xyz= f[0]+"\t"+f[1]+"\t"+f[2];
context.write(key, new Text(xyz));
f[0]=f[1]=f[2]=0;
}
}
主要:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Agger {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf("Usage: Agger <input dir> <output dir>\n");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(Agger.class);
job.setJobName("Aggregation");
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(AMapper.class);
job.setReducerClass(AReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);
}
}
这是代码。
这是我得到的错误
15/03/19 04:53:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/03/19 04:53:23 INFO input.FileInputFormat: Total input paths to process : 1
15/03/19 04:53:23 WARN snappy.LoadSnappy: Snappy native library is available
15/03/19 04:53:23 INFO snappy.LoadSnappy: Snappy native library loaded
15/03/19 04:53:23 INFO mapred.JobClient: Running job: job_201503190444_0003
15/03/19 04:53:24 INFO mapred.JobClient: map 0% reduce 0%
15/03/19 04:53:29 INFO mapred.JobClient: map 100% reduce 0%
15/03/19 04:53:33 INFO mapred.JobClient: Task Id : attempt_201503190444_0003_r_000000_0, Status : FAILED
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at AReducer.reduce(AReducer.java:17)
at AReducer.reduce(AReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201503190444_0003_r_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201503190444_0003_r_000000_0: log4j:WARN Please initialize the log4j system properly.
attempt_201503190444_0003_r_000000_0: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
15/03/19 04:53:38 INFO mapred.JobClient: Task Id : attempt_201503190444_0003_r_000000_1, Status : FAILED
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at AReducer.reduce(AReducer.java:17)
at AReducer.reduce(AReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201503190444_0003_r_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201503190444_0003_r_000000_1: log4j:WARN Please initialize the log4j system properly.
attempt_201503190444_0003_r_000000_1: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
15/03/19 04:53:42 INFO mapred.JobClient: Task Id : attempt_201503190444_0003_r_000000_2, Status : FAILED
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at AReducer.reduce(AReducer.java:17)
at AReducer.reduce(AReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201503190444_0003_r_000000_2: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201503190444_0003_r_000000_2: log4j:WARN Please initialize the log4j system properly.
attempt_201503190444_0003_r_000000_2: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
15/03/19 04:53:50 INFO mapred.JobClient: Job complete: job_201503190444_0003
15/03/19 04:53:50 INFO mapred.JobClient: Counters: 29
15/03/19 04:53:50 INFO mapred.JobClient: File System Counters
15/03/19 04:53:50 INFO mapred.JobClient: FILE: Number of bytes read=0
15/03/19 04:53:50 INFO mapred.JobClient: FILE: Number of bytes written=182376
15/03/19 04:53:50 INFO mapred.JobClient: FILE: Number of read operations=0
15/03/19 04:53:50 INFO mapred.JobClient: FILE: Number of large read operations=0
15/03/19 04:53:50 INFO mapred.JobClient: FILE: Number of write operations=0
15/03/19 04:53:50 INFO mapred.JobClient: HDFS: Number of bytes read=347
15/03/19 04:53:50 INFO mapred.JobClient: HDFS: Number of bytes written=0
15/03/19 04:53:50 INFO mapred.JobClient: HDFS: Number of read operations=2
15/03/19 04:53:50 INFO mapred.JobClient: HDFS: Number of large read operations=0
15/03/19 04:53:50 INFO mapred.JobClient: HDFS: Number of write operations=0
15/03/19 04:53:50 INFO mapred.JobClient: Job Counters
15/03/19 04:53:50 INFO mapred.JobClient: Failed reduce tasks=1
15/03/19 04:53:50 INFO mapred.JobClient: Launched map tasks=1
15/03/19 04:53:50 INFO mapred.JobClient: Launched reduce tasks=4
15/03/19 04:53:50 INFO mapred.JobClient: Data-local map tasks=1
15/03/19 04:53:50 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=4785
15/03/19 04:53:50 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=19188
15/03/19 04:53:50 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/03/19 04:53:50 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/19 04:53:50 INFO mapred.JobClient: Map-Reduce Framework
15/03/19 04:53:50 INFO mapred.JobClient: Map input records=20
15/03/19 04:53:50 INFO mapred.JobClient: Map output records=20
15/03/19 04:53:50 INFO mapred.JobClient: Map output bytes=253
15/03/19 04:53:50 INFO mapred.JobClient: Input split bytes=94
15/03/19 04:53:50 INFO mapred.JobClient: Combine input records=0
15/03/19 04:53:50 INFO mapred.JobClient: Combine output records=0
15/03/19 04:53:50 INFO mapred.JobClient: Spilled Records=20
15/03/19 04:53:50 INFO mapred.JobClient: CPU time spent (ms)=400
15/03/19 04:53:50 INFO mapred.JobClient: Physical memory (bytes) snapshot=150429696
15/03/19 04:53:50 INFO mapred.JobClient: Virtual memory (bytes) snapshot=387174400
15/03/19 04:53:50 INFO mapred.JobClient: Total committed heap usage (bytes)=160501760
我正在尝试从映射器获取输出为{'key1',1#2#3,2#3#4,3#4#5}。
该输出将被馈送到 reducer , reducer 将拆分值并将其相加。但是某种程度上,代码无法正常工作。
任何帮助或指导将不胜感激。另外,如果有人可以告诉我一个好的站点来学习hadoop,只会对我有所帮助,所以请帮忙!
最佳答案
我认为您的程序没有错误。由于Java试图将空字符串“”转换为Integer,因此您将获得异常。通常,当数据文件(文本文件)中有多余的空间或换行时,就会出现此错误。
为避免此类错误,请在try catch中将映射端和化简器端逻辑括起来。
关于java - Hadoop Reducer代码parseint命令错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29140819/