我必须修改hadoop wordcount示例,以计算以前缀“cons”开头的单词的数量,然后需要按结果的降序对结果进行排序。谁能告诉我如何为此编写映射器和简化器代码?
码:
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//Replacing all digits and punctuation with an empty string
String line = value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase();
//Extracting the words
StringTokenizer record = new StringTokenizer(line);
//Emitting each word as a key and one as itsvalue
while (record.hasMoreTokens())
context.write(new Text(record.nextToken()), new IntWritable(1));
}
}
最佳答案
要计算以“cons”开头的单词数,您可以在从mapper发射时丢弃所有其他单词。
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
IntWritable one = new IntWritable(1);
String[] words = value.toString().split(" ");
for (String word : words) {
if (word.startsWith("cons"))
context.write(new Text("cons_count"), one);
}
}
现在,reducer将仅收到一个key = cons_count,您可以将这些值相加以获得计数。
要根据频率对以“cons”开头的单词进行排序,应将以“cons”开头的单词分配给相同的reducer,reducer应该对其进行汇总和排序。要做到这一点,
public class MyMapper extends Mapper<Object, Text, Text, Text> {
@Override
public void map(Object key, Text value, Context output) throws IOException,
InterruptedException {
String[] words = value.toString().split(" ");
for (String word : words) {
if (word.startsWith("cons"))
context.write(new Text("cons"), new Text(word));
}
}
}
reducer :
public class MyReducer extends Reducer<Text, Text, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<Text> values, Context output)
throws IOException, InterruptedException {
Map<String,Integer> wordCountMap = new HashMap<String,Integer>();
for(Text value: values){
word = value.get();
if (wordCountMap.contains(word) {
Integer count = wordCountMap.get(key);
count++;
wordCountMap.put(word,count)
}else {
wordCountMap.put(word,new Integer(1));
}
}
//use some sorting mechanism to sort the map based on values.
// ...
for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
context.write(new Word(entry.getKey(),new IntWritable(entry.getValue());
}
}
}
关于java - bigdata hadoop java codefor wordcount已修改,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26170827/