hadoop - Mahout-运行trainnb时出错

标签 hadoop mahout

使用Mahout seq2sparse命令,我设法在HDFS中成功创建以下文件夹

df-count
dictionary.file-0
frequency.file-0
tf-vectors
tfidf-vectors
tokenized-documents
wordcount

之后,当我使用以下语法运行trainnb命令时
mahout trainnb -i tweet-vectors -el -li labelindex -o model -ow -c

我收到以下错误。有人知道相同的分辨率吗?
Exception in thread "main" java.lang.IllegalStateException: hdfs://machineinfo:8020/user/hhhh/tweetvectors/df-count
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirIterator$1.apply(SequenceFileDirIterator.java:115)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirIterator$1.apply(SequenceFileDirIterator.java:106)
        at com.google.common.collect.Iterators$8.transform(Iterators.java:860)
        at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at com.google.common.collect.Iterators$5.hasNext(Iterators.java:597)
        at com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:43)
        at org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:122)
        at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:180)
        at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:94)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.main(TrainNaiveBayesJob.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.io.FileNotFoundException: File does not exist: /user/hhhh/tweet-vectors/df-count
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
        at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:63)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirIterator$1.apply(SequenceFileDirIterator.java:110)
        ... 22 more

最佳答案

看来mahout无法在HDFS中看到/user/hhhh/tweet-vectors/df-count文件。

首先,尝试hadoop dfs -ls /user/hhhh/tweet-vectors/df-count验证文件是否存在。
如果不存在,那就是您的问题。如果确实存在,请检查它是文件还是目录。 mahout似乎在寻找文件,而不是目录。

如果存在并且是文件,则验证mahout是否连接到存储文件的相同hadoop namenode实例。

关于hadoop - Mahout-运行trainnb时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18808723/

相关文章:

mahout - 在学习 Mahout 之前我需要掌握 Hadoop 吗?

java - 将 mahout 随机森林分类输出转换为可读

hadoop - 通过将itemid列与userid交换,我可以将基于item的Hadoop实现用于基于用户的实现吗?

sorting - 排序(Order by)在Hive中是如何实现的?

json - 将数据从多个 Hive 表转换为复杂的 JSON

xml - Hadoop - 使用 XPath 在 XML 节点上进行 PIG 循环

java - mahout - 运行 grouplens 示例错误

python - 我的 boto elastic mapreduce jar 作业流程参数有什么问题?

java - Hadoop WritableComparable<T> RawType,什么是 RawType

hadoop - 德鲁伊中的精确不同计数