java - bigdata hadoop java codefor wordcount已修改

标签 java hadoop bigdata hadoop2

我必须修改hadoop wordcount示例,以计算以前缀“cons”开头的单词的数量,然后需要按结果的降序对结果进行排序。谁能告诉我如何为此编写映射器和简化器代码?

码:

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> 
{ 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
    { 
        //Replacing all digits and punctuation with an empty string 
        String line =  value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase();
        //Extracting the words 
        StringTokenizer record = new StringTokenizer(line); 
        //Emitting each word as a key and one as itsvalue 
        while (record.hasMoreTokens()) 
            context.write(new Text(record.nextToken()), new IntWritable(1)); 
    } 
}

最佳答案

要计算以“cons”开头的单词数,您可以在从mapper发射时丢弃所有其他单词。

public void map(Object key, Text value, Context context) throws IOException,
        InterruptedException {
    IntWritable one = new IntWritable(1);
    String[] words = value.toString().split(" ");
    for (String word : words) {
        if (word.startsWith("cons"))
              context.write(new Text("cons_count"), one);
    }
}

现在,reducer将仅收到一个key = cons_count,您可以将这些值相加以获得计数。

要根据频率对以“cons”开头的单词进行排序,应将以“cons”开头的单词分配给相同的reducer,reducer应该对其进行汇总和排序。要做到这一点,
public class MyMapper extends Mapper<Object, Text, Text, Text> {


@Override
public void map(Object key, Text value, Context output) throws IOException,
        InterruptedException {
      String[] words = value.toString().split(" ");
      for (String word : words) {
        if (word.startsWith("cons"))
              context.write(new Text("cons"), new Text(word));
    }
 }
}

reducer :
public class MyReducer extends Reducer<Text, Text, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<Text> values, Context output)
        throws IOException, InterruptedException {
    Map<String,Integer> wordCountMap = new HashMap<String,Integer>();
    for(Text value: values){
        word = value.get();
        if (wordCountMap.contains(word) {
           Integer count = wordCountMap.get(key);
           count++;
           wordCountMap.put(word,count)
        }else {
         wordCountMap.put(word,new Integer(1));
        }
    }

    //use some sorting mechanism to sort the map based on values.
    // ...

    for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
        context.write(new Word(entry.getKey(),new IntWritable(entry.getValue());
    } 
}

}

关于java - bigdata hadoop java codefor wordcount已修改,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26170827/

相关文章:

java - 我应该在以下示例中使用哪种 JavaFX 数据结构?

java - 如何编写一个常规的 equals() 方法,可能与 hashCode() 一起使用?

php - 如何使用php从Elasticsearch批量删除文档

Hbase FuzzyRowFilter按键跳跃如何工作

java - 如果存储在 concurrentHashMap 中,POJO 对象中的可变字段是否线程安全?

java - UnsatisfiedLinkError - 此错误意味着什么?每个提示都很有用!

hadoop - 对 webhdfs 的 http 请求,但服务器的回复为空

hadoop - 可用于创建数据管道的不同工具

hadoop - map() 函数的调用次数与 MR Job 发出的 map 任务数之间的关系

MySQL 用同一个表中另一个字段的查询结果填充一个新字段