hadoop - 使用 ArrayWritables 时出现问题

标签 hadoop mapreduce hadoop2

我是 Hadoop 的初学者,正在使用 Hadoop map-reduce 中的ArrayWritables

这是我正在使用的Mapper代码:-

public class Base_Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    String currLine[] = new String[1000];
    Text K = new Text();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        currLine = line.split("");
        int count = 0;
        for (int i = 0; i < currLine.length; i++) {
            String currToken = currLine[i];
            count++;
            K.set(currToken);
            context.write(K, new IntWritable(count));
        }

    }
}

reducer :-

public class Base_Reducer extends Reducer<Text, IntWritable,Text, IntArrayWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        IntArrayWritable finalArray = new IntArrayWritable();
        IntWritable[] arr = new IntWritable[1000];
        for (int i = 0; i < 150; i++)
            arr[i] = new IntWritable(0);
        int redCount = 0;
        for (IntWritable val : values) {
            int thisValue = val.get();
            for (int i = 1; i <= 150; i++) {
                if (thisValue == i)
                    arr[i - 1] = new IntWritable(redCount++);
            }
        }
        finalArray.set(arr);
        context.write(key, finalArray);
    }

}

我使用 IntArrayWritable 作为 ArrayWritable 的子类,如下所示:-

import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;

public class IntArrayWritable extends ArrayWritable {
    public IntArrayWritable() {
        super(IntWritable.class);
    }

    public IntArrayWritable(IntWritable[] values) {
        super(IntWritable.class, values);
    }
}

我的作业预期输出是一组作为键的基数(这是正确的)和一组IntWritables作为值。 但我得到的输出为:-

    com.feathersoft.Base.IntArrayWritable@30374534
A   com.feathersoft.Base.IntArrayWritable@7ca071a6
C   com.feathersoft.Base.IntArrayWritable@9858936
G   com.feathersoft.Base.IntArrayWritable@1df33d1c
N   com.feathersoft.Base.IntArrayWritable@4c3108a0
T   com.feathersoft.Base.IntArrayWritable@272d6774

为了解决此问题,我必须做出哪些更改?

最佳答案

您需要在 IntArrayWritable 实现中覆盖 toString() 方法的默认行为。

请尝试这个:

import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;

public class IntArrayWritable extends ArrayWritable {
    public IntArrayWritable() {
        super(IntWritable.class);
    }

    public IntArrayWritable(IntWritable[] values) {
        super(IntWritable.class, values);
    }

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder("[");

        for (String s : super.toStrings())
        {
            sb.append(s).append(" ");
        }

        sb.append("]")
        return sb.toString();
    }
}

如果您喜欢这个答案,请将其标记为已接受。谢谢。

关于hadoop - 使用 ArrayWritables 时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28670191/

相关文章:

java - 由于NoClassDefFoundError而无法运行hadoop应用程序

linux - Hive命令行通过Cygwin select查询报错

sorting - 深入了解hadoop中Map reduce作业中map阶段的内部工作?

java - 将 JAVA_HOME 设置为 java 安装的根目录

hadoop - Hadoop集群中关键文件分布

eclipse - Cloudera CDH4 Eclipse连接被拒绝错误

java - Hadoop SequenceFile-记录的自动增量键

hadoop - MapReduce - 对于每个学生,学生发布最多帖子的时间是什么时候

amazon-web-services - AWS 文件上传

hadoop - 尝试运行配置单元时出错