hadoop - 在 Hadoop 中序列化一个长字符串

我有一个在 Hadoop 中实现 WritableComparable 类的类。这个类有两个字符串变量，一个很短，一个很长。我用 writeChars写这些变量和readLine阅读它们，但似乎我遇到了某种错误。在 Hadoop 中序列化这么长的字符串的最佳方法是什么？

最佳答案

我认为您可以使用 byteswritable 来提高效率。检查以下具有 BytesWritable 类型作为 callId 的自定义键。

public class CustomMRKey implements WritableComparable<CustomMRKey> {
private BytesWritable callId;
private IntWritable mapperType;

/**
 * @default constructor
 */
public CustomMRKey() {
    set(new BytesWritable(), new IntWritable());
}

/**
 * Constructor
 * 
 * @param callId
 * @param mapperType
 */
public CustomMRKey(BytesWritable callId, IntWritable mapperType) {
    set(callId, mapperType);
}

/**
 * sets the call id and mapper type
 * 
 * @param callId
 * @param mapperType
 */
public void set(BytesWritable callId, IntWritable mapperType) {
    this.callId = callId;
    this.mapperType = mapperType;
}

/**
 * This method returns the callId
 * 
 * @return callId
 */
public BytesWritable getCallId() {
    return callId;
}

/**
 * This method sets the callId given a callId
 * 
 * @param callId
 */
public void setCallId(BytesWritable callId) {
    this.callId = callId;
}

/**
 * This method returns the mapper type
 * 
 * 
 * @return
 */
public IntWritable getMapperType() {
    return mapperType;
}

/**
 * This method is set to store the mapper type
 * 
 * @param mapperType
 */
public void setMapperType(IntWritable mapperType) {
    this.mapperType = mapperType;
}

@Override
public void readFields(DataInput in) throws IOException {
    callId.readFields(in);
    mapperType.readFields(in);

}

@Override
public void write(DataOutput out) throws IOException {
    callId.write(out);
    mapperType.write(out);
}

@Override
public boolean equals(Object obj) {
    if (obj instanceof CustomMRCdrKey) {
        CustomMRCdrKey key = (CustomMRCdrKey) obj;
        return callId.equals(key.callId)
                && mapperType.equals(key.mapperType);
    }
    return false;
}

@Override
public int compareTo(CustomMRCdrKey key) {
    int cmp = callId.compareTo(key.getCallId());
    if (cmp != 0) {
        return cmp;
    }
    return mapperType.compareTo(key.getMapperType());
}

}

要在说映射器代码中使用说，您可以使用以下内容生成 BytesWritable 表单的 key :-

您可以调用为:

CustomMRKey customKey=new CustomMRKey(new BytesWritable(),new IntWritable());
customKey.setCallId(makeKey(value, this.resultKey));
customKey.setMapperType(this.mapTypeIndicator);

然后 makeKey 方法如下所示:-

public BytesWritable makeKey(Text value, BytesWritable key) throws IOException {
    try {
        ByteArrayOutputStream byteKey = new ByteArrayOutputStream(Constants.MR_DEFAULT_KEY_SIZE);
        for (String field : keyFields) {
            byte[] bytes = value.getString(field).getBytes();
            byteKey.write(bytes,0,bytes.length);
        }
        if(key==null){
            return new BytesWritable(byteKey.toByteArray());
        }else{
            key.set(byteKey.toByteArray(), 0, byteKey.size());
            return  key;
        }
    } catch (Exception ex) {
        throw new IOException("Could not generate key", ex);
    }
}

希望这可能会有所帮助。

关于hadoop - 在 Hadoop 中序列化一个长字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20670404/

hadoop - 在 Hadoop 中序列化一个长字符串

上一篇：java - 在第二次MR期间未创建文件

下一篇：hadoop - 给两个任务同名是否会引起问题