java - 线程 "main"org.apache.Hadoop.mapred.InvalidJobConfException : Output directory not set in JobConf 中的异常

标签 java hadoop mapreduce

我是一个新的 Hadoop 用户。我的程序是跳过 mapreduce 中的不良记录数据。我没有跳过坏数据,所以首先,我不想跳过数据,我想找出发生了哪个错误。因此,我添加了 mycustomrunjob() 以了解为什么我不能跳过不良记录。目前,我删除了跳过编码行。尽管我已经设置了输出文件路径,但在运行此程序时遇到问题:

import java.io.IOException;
import org.apache.hadoop.conf.* ;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.* ;
import org.apache.hadoop.mapred.* ;
import org.apache.hadoop.mapred.lib.* ;

public class SkipData
{  
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable>
{
 private final static LongWritable one = new LongWritable(1);
private Text word = new Text("totalcount");
public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
if (line.equals("skiptext"))
throw new RuntimeException("Found skiptext") ;

output.collect(word, one);

}
}
public static RunningJob myCustomRunJob(JobConf job) throws Exception {
JobClient jc = new JobClient(job);
RunningJob rj = jc.submitJob(job);
if (!jc.monitorAndPrintJob(job, rj)) {
  throw new IOException("Job failed with info: " + rj.getFailureInfo());
}
return rj;
}
public static void main(String[] args) throws Exception
{
System.setProperty("hadoop.home.dir", "/");
Configuration config = new Configuration() ;
JobConf conf = new JobConf(config, SkipData.class);
RunningJob result=myCustomRunJob(conf);

conf.setJobName("SkipData");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);

conf.setMapperClass(MapClass.class);
conf.setCombinerClass(LongSumReducer.class);
conf.setReducerClass(LongSumReducer.class);

FileInputFormat.setInputPaths(conf,args[0]) ;
FileOutputFormat.setOutputPath(conf, new Path(args[1])) ;

JobClient.runJob(conf);
}
}

我试图多次完成这个错误。我使用旧 API。我该如何解决这个问题?
18/02/28 11:05:28 DEBUG security.UserGroupInformation:  PrivilegedActionException as:saung (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.
18/02/28 11:05:28 DEBUG security.UserGroupInformation: PrivilegedActionException as:saung (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.
Exception in thread "main"  org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.
 at  org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.ja va:117)
 at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
 at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
 at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
 at mapredpack.SkipData.myCustomRunJob(SkipData.java:90)
 at mapredpack.SkipData.main(SkipData.java:140)

最佳答案

您正在尝试运行该作业两次 - 通过调用

RunningJob result=myCustomRunJob(conf);

这么早,您的工作将失败,因为在那个阶段没有设置任何配置。我会删除该行(以及 myCustomRunJob(JobConf job) 方法)。 JobClient.runJob(conf)在最底部将处理运行作业。

关于java - 线程 "main"org.apache.Hadoop.mapred.InvalidJobConfException : Output directory not set in JobConf 中的异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49022332/

相关文章:

java - Spring boot 限制 Controller 中特定API的并发调用数

python - spark 从 oracle 导入数据 - java.lang.ClassNotFoundException : oracle. jdbc.driver.OracleDriver

hadoop - 公私云(混合云)

hadoop - 如何在 Apache PIG 中对日期进行排序?

java - riak mapreduce 对 java 中响应大小的限制

java - Try-With-Resources 中的多个资源 - 里面的语句

java - 将具有空布局的 JPanel 插入 JScroll

java - 我要把我的 socket 改成什么?

scala - 读取 s3 存储桶时出错

hadoop - 未使用Hadoop Mapreduce代码创建的文件夹