hadoop - 独立的 map reduce 作业一个接一个地执行

标签 hadoop mapreduce bigdata

是否有可能执行独立的 map reduce 作业(不在 reducer 输出的链接中

  1. 成为映射器的输入。
  2. 可以一个接一个地执行。

最佳答案

在你的驱动代码中调用两个方法runfirstjob,runsecondjob.就像这样。这只是一个提示,根据你的需要做修改

public class ExerciseDriver {


static Configuration conf;

public static void main(String[] args) throws Exception {

    ExerciseDriver ED = new ExerciseDriver();
    conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);

    if(args.length < 4) {
        System.out.println("Too few arguments. Arguments should be:  <hdfs input folder> <hdfs output folder> <N configurable Integer Value>");
        System.exit(0);
    }

    String pathin1stmr = args[0];
    String pathout1stmr = args[1];
    String pathin2ndmr = args[2];
    String pathout2ndmr = args[3];

    ED.runFirstJob(pathin1stmr, pathout1stmr);

    ED.runSecondJob(pathin2ndmr, pathout2ndmr);

}

public int runFirstJob(String pathin, String pathout)  

 throws Exception {

    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper1.class);
    job.setCombinerClass(ExerciseCombiner.class);
    job.setReducerClass(ExerciseReducer1.class);
    job.setInputFormatClass(ParagrapghInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));

   job.submit();  

   job.getMaxMapAttempts();

   /*
   JobContextImpl jc = new JobContextImpl();
   TaskReport[] maps = jobclient.getMapTaskReports(job.getJobID());

    */

    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;

}

  public int runSecondJob(String pathin, String pathout) throws Exception { 
    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper2.class);
    job.setReducerClass(ExerciseReducer2.class);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);    
    FileInputFormat.addInputPath(job,new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));
    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;
}

 }

关于hadoop - 独立的 map reduce 作业一个接一个地执行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29507243/

相关文章:

performance - 自定义映射器和 Reducer 与 HiveQL

hadoop - 哪种方法阻止 reducer 在 hadoop yarn 中启动实际的 reduce 阶段?

database - 创建超大哈希数据库的技巧

database-design - 在 Cassandra 中拥有许多键空间和可能的数千个表是个好主意吗?

python - 将带有 Python 的机器学习服务和机器学习功能添加到现有 SQL Server 2017

hadoop - 需要帮助来解决这个mapreduce代码

hadoop - MapReduce主流程如何决定将哪个任务分配给单个工作流程?

hadoop - 如何将多个零件文件放入路径

java - 文件未添加到DistributedCache

mongodb - 有没有办法获取mongodb中对象字段的长度?