java - 执行一个简单的 mapreduce 函数以在 Hadoop 的日志文件中搜索字符串

标签 java eclipse hadoop sandbox hortonworks-data-platform

当我在 eclipse 中使用本地文件系统中的输入文件执行它时,mapreduce 工作正常。但是当我通过将输入文件放入 HDFS 来在 Hortonworks Sandbox 中执行 jar 文件时,stringKey 变量没有被设置,即 stringKey 在 mapper 中为 null 但我从 main 函数实例化它并且可以在那里访问。我的代码有什么错误吗?

import java.io.IOException;
    import java.util.Iterator;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapred.TextInputFormat;
    import org.apache.hadoop.mapred.TextOutputFormat;


    public class StringSearch {
        static String stringKey;
        public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
                            throws IOException {
                String line = value.toString();
                System.out.println(StringSearch.stringKey);
                if(StringSearch.stringKey != null)
                {
                    if(line.contains(StringSearch.stringKey))
                    {
                        word.set(line);
                        output.collect(word, one);
                    }
                }
            }

        }
        public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
            public void reduce(Text key, Iterator<IntWritable> values,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
                            throws IOException {
                int sum = 0;
                //Iterate through all the values with respect to a key and
                //sum up all of them
                while (values.hasNext()) {
                    sum += values.next().get();
                }
                //Push to the output collector the Key and the obtained
                //sum as value
                output.collect(key, new IntWritable(sum));

            }
        }
        public static class Main {
            public static void main(String[] args) throws Exception {
                if(args.length > 2)
                {
                    stringKey = args[2];
                    System.out.println(stringKey);
                }

                //creating a JobConf object and assigning a job name for identification purposes
                JobConf conf = new JobConf(StringSearch.class);
                conf.setJobName("StringSearch");
                //Setting configuration object with the Data Type of output Key and Value for //map and reduce if you have diffrent type of outputs there is other set method //for them
                conf.setOutputKeyClass(Text.class);
                conf.setOutputValueClass(IntWritable.class);
                conf.setMapperClass(Map.class);
                conf.setCombinerClass(Reduce.class); //set theCombiner class
                conf.setReducerClass(Reduce.class);
                conf.setInputFormat(TextInputFormat.class);
                conf.setOutputFormat(TextOutputFormat.class);
                //the hdfs input and output directory to be fetched from the command line
                FileInputFormat.setInputPaths(conf, new Path(args[0]));
                FileOutputFormat.setOutputPath(conf, new Path(args[1]));
                //submits the job to MapReduce. and returns only after the job has completed
                JobClient.runJob(conf);
            }

        }

    }  

最佳答案

您正在尝试访问 hadoop/hdfs 中的 java 变量,这是不可能的。 代替 stringKey = args[2];,使用 conf.set("stringkey", args[2])。 在 mapper/reducer 中初始化 conf 并使用 conf.get("stringkey")

关于java - 执行一个简单的 mapreduce 函数以在 Hadoop 的日志文件中搜索字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31583431/

相关文章:

java - "java.lang.SecurityException: Prohibited package name: java.sql"错误仅在 Eclipse 外部执行时发生

java - Eclipse 无法打开并显示 "an error has occurred"?

windows - 运行 MapReduce 作业时出错 : not a valid Inet address

python - 如何使用 Python pickle 将文件转储到 Hadoop HDFS 目录?

java - 如何从mysql获取多行数据

Java未报告的异常让我困惑

java - Java 中 "^= "运算符的用途是什么?

Eclipse java - 如何添加带双引号的整行

hadoop - 如何按多列分组,然后在 Hive 中转置

java - JButton setBackground 不起作用