hadoop - 为什么不为 hadoop TeraSort 映射器/ reducer

我计划在 Hadoop 0.20.2 中的 TeraSort 类的映射器中插入一些代码。但是查看源码，找不到mapper实现的那段。通常，我们会看到一个名为 job.setMapperClass() 的方法，它指示映射器类。但是，对于 TeraSort，我只能看到 setInputformat、setOutputFormat 之类的东西。我找不到调用 mapper 和 reduce 方法的位置？任何人都可以对此提供一些提示吗？谢谢，源码是这样的，

public int run(String[] args) throws Exception {
   LOG.info("starting");
   JobConf job = (JobConf) getConf();
   Path inputDir = new Path(args[0]);
   inputDir = inputDir.makeQualified(inputDir.getFileSystem(job));
   Path partitionFile = new Path(inputDir, TeraInputFormat.PARTITION_FILENAME);
   URI partitionUri = new URI(partitionFile.toString() +
                           "#" + TeraInputFormat.PARTITION_FILENAME);
   TeraInputFormat.setInputPaths(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));
   job.setJobName("TeraSort");
   job.setJarByClass(TeraSort.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(Text.class);
   job.setInputFormat(TeraInputFormat.class);
   job.setOutputFormat(TeraOutputFormat.class);
   job.setPartitionerClass(TotalOrderPartitioner.class);
   TeraInputFormat.writePartitionFile(job, partitionFile);
   DistributedCache.addCacheFile(partitionUri, job);
   DistributedCache.createSymlink(job);
   job.setInt("dfs.replication", 1);
   // TeraOutputFormat.setFinalSync(job, true);                                                                                                                                                                                             
   job.setNumReduceTasks(0);
   JobClient.runJob(job);
   LOG.info("done");
   return 0;
 }

对于其他类，如 TeraValidate，我们可以找到如下代码，

job.setMapperClass(ValidateMapper.class);
job.setReducerClass(ValidateReducer.class);

我看不到 TeraSort 的此类方法。

谢谢，

最佳答案

为什么一个排序需要为它设置Mapper和Reducer类？

默认值是标准的Mapper(以前的身份Mapper)和标准的Reducer。这些是您通常继承的类。

您基本上可以说，您只是从输入中发出所有内容，然后让 Hadoop 进行自己的排序工作。所以排序是“默认”的。

关于hadoop - 为什么不为 hadoop TeraSort 映射器/ reducer ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6565255/

hadoop - 为什么不为 hadoop TeraSort 映射器/ reducer

上一篇：ant - 尝试使用 Fuse 挂载 HDFS。无法编译 libhdfs

下一篇：java - 机器学习/数据挖掘/大数据 : Popular language for programming and community support