我正在尝试使用 Hadoop mapReduce 对输入数据进行排序。问题是我只能按键对键值对进行排序,而我试图按值对它们进行排序。每个值的键都是用计数器创建的,因此第一个值 (234) 具有键 1,第二个值 (944) 具有键 2,等等。知道如何执行此操作并按值对输入进行排序吗?
import java.io.IOException;
import java.util.StringTokenizer;
import java.util.ArrayList;
import java.util.List;
import java.util.Collections;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Sortt {
public static class TokenizerMapper
extends Mapper<Object, Text, Text ,IntWritable >{
int k=0;
int v=0;
int va=0;
public Text ke = new Text();
private final static IntWritable val = new IntWritable();
public void map(Object key, Text value, Context context) throws
IOException, InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens())
{
val.set(Integer.parseInt(itr.nextToken()));
v=val.get();
k=k+1;
ke.set(Integer.toString(k));
context.write(ke, new IntWritable(v));}
}
}
public static class SortReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
int a=0;
int v=0;
private IntWritable va = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
List<Integer> sorted = new ArrayList<Integer>();
for (IntWritable val : values) {
a= val.get();
sorted.add(a);
}
Collections.sort(sorted);
for(int i=0;i<sorted.size();i++) {
v=sorted.get(i);
va.set(v);
context.write(key, va);
}
}
}
public static void main(String[] args) throws Exception {
long startTime=0;
long Time=0;
long duration=0;
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "sort");
job.setJarByClass(Sortt.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(SortReducer.class);
job.setReducerClass(SortReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Time = System.currentTimeMillis();
//duration = (endTime-startTime)/1000000;
System.out.println("time="+Time+"MS");
}
}
输入:
234
944
241
130
369
470
250
100
250
735
856
659
425
756
123
756
459
754
654
951
753
254
698
741
预期输出:
8100
15123
4130
1234
3241
24241
7250
9250
22254
5369
13425
17459
6470
19654
12659
23698
10735
21753
18754
14756
16756
11856
2944
20951
当前输出:
1234
10735
11856
12659
13425
14757
15123
16756
17459
18754
19654
2944
20951
21753
22254
23698
24741
3241
4130
5369
6470
7250
8100
9250
最佳答案
MapReduce 输出默认按键排序,要按值排序,您可以使用辅助排序。 二次排序是根据值对 reducer 输出进行排序的最佳技术之一,here是一个完整的示例。
关于java - 如何在mapReduce Hadoop框架中对值(及其相应的键)进行排序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55494120/