我正在使用 ArrayWritable
,在某些时候我需要检查 Hadoop 如何序列化 ArrayWritable
,这是我通过设置 job.setNumReduceTasks( 0)
:
0 IntArrayWritable@10f11b8
3 IntArrayWritable@544ec1
6 IntArrayWritable@fe748f
8 IntArrayWritable@1968e23
11 IntArrayWritable@14da8f4
14 IntArrayWritable@18f6235
这是我使用的测试映射器:
public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int red = Integer.parseInt(value.toString());
IntWritable[] a = new IntWritable[100];
for (int i =0;i<a.length;i++){
a[i] = new IntWritable(red+i);
}
IntArrayWritable aw = new IntArrayWritable();
aw.set(a);
context.write(key, aw);
}
}
IntArrayWritable
取自 javadoc 中给出的示例:ArrayWritable .
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
}
我实际上检查了 Hadoop 的源代码,这对我来说毫无意义。
ArrayWritable
不应序列化类名,并且无法使用 6/7 十六进制值序列化 100 个 IntWritable
的数组。该应用程序实际上似乎工作得很好,reducer 反序列化了正确的值......
怎么了?我错过了什么?
最佳答案
您必须覆盖默认的 toString()
方法。
它由 TextOutputFormat
调用以创建人类可读的格式。
尝试以下代码并查看结果:
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
for (String s : super.toStrings())
{
sb.append(s).append(" ");
}
return sb.toString();
}
}
关于java - 使用 ArrayWritable 的序列化似乎以一种有趣的方式工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7919035/