我编写了一段代码,其功能类似于 SQL GroupBy。
我获取的数据集在这里:
250788681419,20090906,200937,200909,619,周日,周末,网上,早上,外出,语音,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,成功发布服务,17,0,1,21.25,635-10 -112-30455
public class MyMap extends Mapper<LongWritable, Text, Text, DoubleWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
double rs=Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DoubleWritable (rs));
}
}
public class MyReduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
protected void reduce(Text key, Iterator<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double sum = 0;
Iterator<DoubleWritable> iter=values.iterator();
while (iter.hasNext())
{
double val=iter.next().get();
sum = sum+ val;
}
context.write(key, new DoubleWritable(sum));
};
}
在Mapper中,当它的值发送第17个参数到reducer来求和时。现在我还想对第 14 个参数求和,如何将其发送到 reducer ?
最佳答案
如果您的数据类型相同,则创建 ArrayWritable 类应该可以解决此问题。该类应类似于:
public class DblArrayWritable extends ArrayWritable
{
public DblArrayWritable()
{
super(DoubleWritable.class);
}
}
您的映射器类如下所示:
public class MyMap extends Mapper<LongWritable, Text, Text, DblArrayWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
DoubleWritable[] values = new DoubleWritable[2];
values[0] = Double.parseDouble(attribute[14]);
values[1] = Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DblArrayWritable.set(values));
}
}
在您的 reducer 中,您现在应该能够迭代 DblArrayWritable 的值。
根据您的示例数据,但看起来它们可能是不同的类型。您也许能够实现一个可以实现这一目的的ObjectArrayWritable类,但我对此并不确定,而且我看不到太多支持它的内容。如果它有效,该类将是:
public class ObjArrayWritable extends ArrayWritable
{
public ObjArrayWritable()
{
super(Object.class);
}
}
您可以通过简单地连接这些值并将它们作为文本传递给reducer来处理这个问题,然后reducer会再次分割它们。
另一个选择是实现您自己的 Writable 类。以下是其工作原理的示例:
public static class PairWritable implements Writable
{
private Double myDouble;
private String myString;
// TODO :- Override the Hadoop serialization/Writable interface methods
@Override
public void readFields(DataInput in) throws IOException {
myLong = in.readDouble();
myString = in.readUTF();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeDouble(myLong);
out.writeUTF(myString);
}
//End of Implementation
//Getter and Setter methods for myLong and mySring variables
public void set(Double d, String s) {
myDouble = d;
myString = s;
}
public Long getLong() {
return myDouble;
}
public String getString() {
return myString;
}
}
关于java - 发送多个参数到reducer-MapReduce,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14516029/