如何为自定义 Hadoop 类型定义 ArrayWritable?我正在尝试在 Hadoop 中实现倒排索引,使用自定义 Hadoop 类型来存储数据
我有一个 Individual Posting 类,它存储术语频率、文档 ID 和文档中术语的字节偏移列表。
我有一个 Posting 类,它有文档频率(该术语出现的文档数量)和个人帖子列表
我已经为 IndividualPostings 中的字节偏移列表定义了一个扩展 ArrayWritable 类的 LongArrayWritable
当我为 IndividualPosting 定义自定义 ArrayWritable 时,我在本地部署(使用 Karmasphere、Eclipse)后遇到了一些问题。
Posting 类列表中的所有 IndividualPosting 实例都是相同的,即使我在 Reduce 方法中得到不同的值也是如此
最佳答案
来自 ArrayWritable
的文档:
A Writable for arrays containing instances of a class. The elements of this writable must all be instances of the same class. If this writable will be the input for a Reducer, you will need to create a subclass that sets the value to be of the proper type. For example:
public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }
您已经引用过使用 WritableComparable
执行此操作Hadoop 定义的类型。这是我假设您的实现看起来像 LongWritable
的样子:
public static class LongArrayWritable extends ArrayWritable
{
public LongArrayWritable() {
super(LongWritable.class);
}
public LongArrayWritable(LongWritable[] values) {
super(LongWritable.class, values);
}
}
您应该能够对任何实现了 WritableComparable
的类型执行此操作,由 the documentation 给出.使用他们的例子:
public class MyWritableComparable implements
WritableComparable<MyWritableComparable> {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable other) {
int thisValue = this.counter;
int thatValue = other.counter;
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
}
应该就是这样。这假设您使用的是 Hadoop API 的修订版 0.20.2
或 0.21.0
。
关于hadoop - 自定义 Hadoop 类型的 ArrayWritable 实现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4386781/