Hadoop PathFilter 配置为空

标签 hadoop

我有一个看起来像这样的路径过滤器:

public class AvroFileInclusionFilter extends Configured implements PathFilter {
  Configuration conf;

  @Override
  public void setConf(Configuration conf) {
      this.conf = conf;
  }

  @Override
  public boolean accept(Path path) {

      System.out.println("FileInclusion: " + conf.get("fileInclusion"));

      return true;
  }
}

我在配置中明确设置了 fileInclusion 属性。出于某种原因,路径过滤器中使用的配置与我在工作中设置的配置不同,如下所示:

    Job job = Job.getInstance(getConf(), "Stock Updater");

    job.getConfiguration().set("outputPath", opts.outputPath);

    String[] inputPaths = findPathsForDays(job.getConfiguration(),
            new Path(opts.inputPath), findDaysToQuery(job.getConfiguration(),
                    opts.updatefile)).toArray(new String[]{});
    job.getConfiguration().set("fileInclusion", "hello`");

    AvroKeyValueInputFormat.addInputPath(job, new Path(opts.inputPath));
    job.getConfiguration().set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());

    job.setInputFormatClass(AvroKeyValueInputFormat.class);

    LazyOutputFormat.setOutputFormatClass(job, AvroKeyValueOutputFormat.class);
    AvroKeyValueOutputFormat.setOutputPath(job, new Path(opts.outputPath));

    job.addCacheFile(new Path(opts.updatefile).toUri());

    AvroKeyValueOutputFormat.setCompressOutput(job, true);
    job.getConfiguration().set(AvroJob.CONF_OUTPUT_CODEC, snappyCodec().toString());

    AvroJob.setInputKeySchema(job, DateKey.SCHEMA$);
    AvroJob.setInputValueSchema(job, StockUpdated.SCHEMA$);
    AvroJob.setMapOutputKeySchema(job, DateKey.SCHEMA$);
    AvroJob.setMapOutputValueSchema(job, StockUpdated.SCHEMA$);
    AvroJob.setOutputKeySchema(job, DateKey.SCHEMA$);
    AvroJob.setOutputValueSchema(job, StockUpdated.SCHEMA$);

    job.setMapperClass(StockUpdaterMapper.class);
    job.setReducerClass(StockUpdaterReducer.class);

    AvroMultipleOutputs.addNamedOutput(job, "output", AvroKeyValueOutputFormat.class,
            DateKey.SCHEMA$, StockUpdated.SCHEMA$);

    job.setJarByClass(getClass());

    boolean success = job.waitForCompletion(true);

conf.get("fileInclusion") 始终为空,我似乎无法弄清楚原因。我已经为此工作了很长一段时间,而且我几乎已经筋疲力尽了。为什么配置不一样?我正在使用“hadoop jar”和“yarn jar”提交作业。

最佳答案

不是通过将 getConf() 方法作为参数来创建对象作业,而是尝试以下操作

Configuration conf = new Configuration();
conf.set("outputPath", opts.outputPath);
conf.set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());
..
..
// After setting up the required key values in Configuration object Create Job object by supplying conf
Job job = new Job(conf, "Stock Updater"); 

关于Hadoop PathFilter 配置为空,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22928420/

相关文章:

java - 使用 Spark 和 JAVA 从 HBase 读取数据

Hadoop、 Apache 星火

java - Spark 和 Java : Error ClassCastException

hadoop - 限制Hadoop 2.6.0中的并发映射任务数

hadoop - DECIMAL值超出范围

hadoop - 如何将CONCAT()函数的结果传递给Pig中的UDF?

windows - start-all.cmd 在 Windows 中给出错误 "til.SysInfoWindows: Expected split length of sysInfo to be 11. Got 7"

hadoop - root scratch dir :/tmp/hive on HDFS should be writable. 当前权限为:-wx------

linux - ambari hadoop cluster + ambari 2.5.0 到新版本 - 2.6.0 有什么区别

hadoop - 无法从 IDEA 连接到资源管理器