java - Hadoop Mapreduce实践

标签 java hadoop mapreduce

Input data file:

name,month,category,expenditure

hitesh,1,A1,10020  
hitesh,2,A2,10300  
hitesh,3,A3,10400  
hitesh,4,A4,11000  
hitesh,5,A1,21000  
hitesh,6,A2,5000  
hitesh,7,A3,9000  
hitesh,8,A4,1000  
hitesh,9,A1,111000    
hitesh,10,A2,12000  
hitesh,11,A3,71000  
hitesh,12,A4,177000    
kuwar,1,A1,10700  
kuwar,2,A2,17000  
kuwar,3,A3,10070  
kuwar,4,A4,10007   
以人为单位的总支出,并计算支出的独特类别。 (输出需要看起来像:名称,总支出,唯一类别的总数)
我尝试过的.....我的代码
人-明智的总支出
public class Emp   
    {   
     public static class MyMap extends Mapper<LongWritable,Text,Text,IntWritable>   
     {
      public void map(LongWritable k,Text v, Context con)
      throws IOException, InterruptedException
      {
       String line = v.toString();
       String[] w=line.split(",");
       String person=w[0];
       int exp=Integer.parseInt(w[3]);
       con.write(new Text(person), new IntWritable(exp));
      }
     }
     public static class MyRed extends Reducer<Text,IntWritable,Text,IntWritable>
     {
      public void reduce(Text k, Iterable<IntWritable> vlist, Context con)
      throws IOException , InterruptedException
      {
       int tot =0;
       for(IntWrit

able v:vlist)
    tot+=v.get();
   con.write(k,new IntWritable(tot));
  }
 }
 public static void main(String[] args) throws Exception
 {
  Configuration c = new Configuration();
  Job j= new Job(c,"person-wise");
  j.setJarByClass(Emp.class);
  j.setMapperClass(MyMap.class);
  j.setReducerClass(MyRed.class);
  j.setOutputKeyClass(Text.class);
  j.setOutputValueClass(IntWritable.class);
  Path p1 = new Path(args[0]);
  Path p2 = new Path(args[1]);
     FileInputFormat.addInputPath(j,p1);
     FileOutputFormat.setOutputPath(j,p2);
     System.exit(j.waitForCompletion(true) ? 0:1);
 }

}
如何在此程序中获取唯一类别的总数,以及如何使输出看起来像名称,总支出,唯一类别的总数。
谢谢

最佳答案

在您的代码中做了修改。希望这是有用的。

 public class Emp   
        {   
         public static class MyMap extends Mapper<LongWritable,Text,Text,Text>   
         {
          public void map(LongWritable k,Text v, Context con)
          throws IOException, InterruptedException
          {
           String line = v.toString();
           String[] w=line.split(",");
           String person=w[0];
           int exp=Integer.parseInt(w[3]);
           con.write(new Text(person), new Text(line));
          }
         }
         public static class MyRed extends Reducer<Text,Text,Text,Text>
         {
          public void reduce(Text k, Iterable<Text> vlist, Context con)
          throws IOException , InterruptedException
          {
           int tot =0;
           Set<String> cat = new HashSet<String>();
           for(Text v:vlist){
               String data = v.toString();
               String[] dataArray = data.Split(",");
               tot+ = Integer.parseInt((dataArray[3]); //calculating the total spend
               cat.add(dataArray[2]);// finding the number of unique categories

      }
          con.write(k,new Text(tot.toString()+","+cat.size().toString()));// writing the name,total spend and total unique categories to the output
     }
     public static void main(String[] args) throws Exception
     {
      Configuration c = new Configuration();
      Job j= new Job(c,"person-wise");
      j.setJarByClass(Emp.class);
      j.setMapperClass(MyMap.class);
      j.setReducerClass(MyRed.class);
      j.setOutputKeyClass(Text.class);
      j.setOutputValueClass(IntWritable.class);
      Path p1 = new Path(args[0]);
      Path p2 = new Path(args[1]);
         FileInputFormat.addInputPath(j,p1);
         FileOutputFormat.setOutputPath(j,p2);
         System.exit(j.waitForCompletion(true) ? 0:1);
     }

    }

关于java - Hadoop Mapreduce实践,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32988353/

相关文章:

java - 无法在具有背景图像的 JLabel 上显示文本

java - ListView 仅显示数组列表中的第一个结果

java - Jaxb - 列表元素的映射属性

hadoop namenode、datanode、secondarynamenode 没有启动

Maven 目标无法执行

java - 运行 MapReduce 程序时出现 ClassNotFound 异常

java - 为什么 org.apache.hadoop.io.Writable 不能转换为 org.apache.hadoop.io.IntWritable?

hadoop - 在Hadoop中合并两个SortedMapWritable?

java - 实例变量如何在 Google App Engine 上工作? [Java]

azure - HDInsight 客户端缺少库