java - 错误的值类:org.apache.mahout.math.VarLongWritable不是类org.apache.mahout.math.VectorWritable

标签 java hadoop mahout

当我使用mahout和Hadoop提出一些建议时,遇到了一个问题。

错误消息是:

Error: java.io.IOException: wrong value class: org.apache.mahout.math.VarLongWritable is not class org.apache.mahout.math.VectorWritable
    at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1378)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
    at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
    at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
    at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

并且,主要功能是:

job.setInputFormatClass(TextInputFormat.class);
    job.setMapperClass(FilesToItemPrefsMapper.class);
    job.setMapOutputKeyClass(VarLongWritable.class);
    job.setMapOutputValueClass(VarLongWritable.class);

    job.setReducerClass(FileToUserVectorReducer.class);
    job.setOutputKeyClass(VarLongWritable.class);
    job.setOutputValueClass(VectorWritable.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    SequenceFileOutputFormat.setOutputCompressionType(job,CompressionType.NONE);

映射器是:
public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
        String line = value.toString();
        Matcher m = NUMBERS.matcher(line);
        m.find();
        VarLongWritable userID = new VarLongWritable(Long.parseLong(m.group()));
        VarLongWritable itemID = new VarLongWritable();
        while (m.find()){
            itemID.set(Long.parseLong(m.group()));
            context.write(userID, itemID);
        }

reducer 为:
public class FileToUserVectorReducer 
        extends Reducer<VarLongWritable, VarLongWritable, VarLongWritable, VectorWritable> {
    public void reducer(VarLongWritable userID, Iterable<VarLongWritable> itemPrefs, Context context)
        throws IOException, InterruptedException{
        Vector userVector = new RandomAccessSparseVector(Integer.MAX_VALUE, 100);
        for(VarLongWritable itemPref : itemPrefs){
            userVector.set((int)itemPref.get(), 1.0f);
        }
        context.write(userID, new VectorWritable(userVector));
    }
}

我认为在job.setOutputValueClass(VectorWritable.class)中设置了VectorWritable的reducer的值。如果是这样,为什么会发出这样的错误消息?

最佳答案

问题出在Reducer功能上。 reducer(...)应该减少,这意味着:

public class FileToUserVectorReducer 
        extends Reducer<VarLongWritable, VarLongWritable, VarLongWritable, VectorWritable> {
    @Override
    public void reduce(VarLongWritable userID, Iterable<VarLongWritable> itemPrefs, Context context)
        throws IOException, InterruptedException{
        Vector userVector = new RandomAccessSparseVector(Integer.MAX_VALUE, 100);
        for(VarLongWritable itemPref : itemPrefs){
            userVector.set((int)itemPref.get(), 1.0f);
        }
        context.write(userID, new VectorWritable(userVector));
    }
}

@Override非常有帮助。如果我使用@Override,则在编译时会发出错误消息。我以为一开始没有必要,但是这种经验证明了它的值(value)。

关于java - 错误的值类:org.apache.mahout.math.VarLongWritable不是类org.apache.mahout.math.VectorWritable,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37158219/

相关文章:

java - 在 java 代码中使用 mahout,而不是 cli

java - 出现未知错误

java - TestNg Eclipse 插件 - 开始测试时出现 NullPointer

csv - hadoop如何读取输入文件?

hadoop - Job 实例在 Hadoop 的构造函数中获取作业列表吗?

scala - 在 Spark/Scala 中使用 ForEach 时的执行流程

java - 将 Lucene 索引转换为 Mahout vector

python - 如何提高 NLTK 的性能?备择方案?

Java- 小数点后 2 位四舍五入 *不仅仅是 %.2f*

java - 什么是 GZIPContentEncodingFilter 的 Jersey 2.0 等价物