java - 在 MapReduce 类中打印

我有这个 MapReduce 示例 [1]，我想在标准输出和日志文件中打印信息 [3]。日志似乎没有打印任何东西。如何使我的 map 类打印输出？

我还配置了 yarn-site.xml 以保留日志 [2]。尽管日志保留在 /app-logs 目录 中，但包含作业执行输出的 userlogs 目录会在作业执行结束时被删除。如何让 MapReduce 不删除 userlogs 目录中的文件？

我正在使用 Yarn。

谢谢，

[1] 仅包含 map 部分的 Wordcount 示例。

public class MyWordCount {
  public static class MyMap extends Mapper {
    Log log = LogFactory.getLog(MyWordCount.class);
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        System.out.println("HERRE");
        log.info("HERRRRRE");
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            output.collect(word, one);
        }
    }

    public void run(Context context) throws IOException, InterruptedException {
        setup(context);
        try {
            while (context.nextKeyValue()) {
                System.out.println("Key: " + context.getCurrentKey() + " Value: " + context.getCurrentValue());
                map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        } finally {
            cleanup(context);
        }
    }

    public void cleanup(Mapper.Context context) {}
}

[2] yarn 站点.xml

    <!-- job history -->
    <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
    <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>900000</value> </property>
    <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/app-logs</value> </property>

[3]日志输出

Log Type: stderr
Log Upload Time: 24-Sep-2015 12:45:19
Log Length: 317
Java HotSpot(TM) Client VM warning: You have loaded library /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Log Type: stdout
Log Upload Time: 24-Sep-2015 12:45:19
Log Length: 0
Log Type: syslog
Log Upload Time: 24-Sep-2015 12:45:19
Log Length: 2604
2015-09-24 12:45:04,569 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-09-24 12:45:05,139 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-09-24 12:45:05,412 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-09-24 12:45:05,413 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-09-24 12:45:05,462 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-09-24 12:45:05,463 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1443113036547_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@1b5a082)
2015-09-24 12:45:05,847 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-09-24 12:45:06,915 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/hadoop-temp/nm-local-dir/usercache/xubuntu/appcache/application_1443113036547_0001
2015-09-24 12:45:07,604 INFO [main]  org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-09-24 12:45:09,402 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2015-09-24 12:45:10,187 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://hadoop-coc-1:9000/input1/b.txt:0+21
2015-09-24 12:45:10,812 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1443113036547_0001_m_000000_0 is done. And is in the process of committing
2015-09-24 12:45:10,969 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1443113036547_0001_m_000000_0 is allowed to commit now
2015-09-24 12:45:10,993 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1443113036547_0001_m_000000_0' to hdfs://192.168.10.110:9000/output1-1442847968/_temporary/1/task_1443113036547_0001_m_000000
2015-09-24 12:45:11,135 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1443113036547_0001_m_000000_0' done.
2015-09-24 12:45:11,135 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2015-09-24 12:45:11,136 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2015-09-24 12:45:11,136 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.

最佳答案

我发现了错误。这是我的代码中的错误。

关于java - 在 MapReduce 类中打印，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32776368/

java - 在 MapReduce 类中打印

上一篇：hadoop - cludera数据库被删除如何恢复cludera集群？

下一篇：hadoop - 在 Hadoop 上拆分文件