我正在尝试在我的 Java 代码中使用 Weka API 运行支持 vector 机 (SVM) 分类器。我在第 13 行收到以下错误 “Src 和 Dest 的属性数量不同:”。请指导我如何解决此问题并成功运行分类器(这是我的时间关键学术项目)。如果可能,请举例说明解决方案。
错误:
java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 22 != 31
at weka.core.RelationalLocator.copyRelationalValues(RelationalLocator.java:88)
at weka.filters.Filter.copyValues(Filter.java:359)
at weka.filters.Filter.push(Filter.java:276)
at weka.filters.unsupervised.attribute.Standardize.convertInstance(Standardize.java:260)
at weka.filters.unsupervised.attribute.Standardize.input(Standardize.java:142)
at weka.filters.Filter.useFilter(Filter.java:661)
at ClassifierJan12Pure.main(ClassifierJan12Pure.java:139)
Java 代码:
1. filteredData = new Instances(new BufferedReader(new FileReader("training.arff")));
2. filteredData.setClassIndex(0);
3. Classifier classifier=new SMO();
4. classifier.buildClassifier(filteredData);
5. FilteredClassifier filteredClassifier=new FilteredClassifier();
6. filteredClassifier.setClassifier(classifier);
7. Filter filter = new StringToWordVector(1000);
8. filteredClassifier.setFilter(filter);
9. Instances filteredTests= new Instances(new BufferedReader(new FileReader("testing.arff")));
10. Standardize sfilter = new Standardize();
11. sfilter.setInputFormat(filteredData); // initializing the filter once with training set
12. Instances newTrain = Filter.useFilter(filteredData, sfilter); // configures the Filter based on train instances and returns filtered instances
**13. Instances testsF = Filter.useFilter(filteredTests, sfilter); // create new test set ///<<<<<<< Error line.**
14. Evaluation eval = new Evaluation(filteredData);
15. eval.evaluateModel(filteredClassifier, filteredTests);
16. System.out.println(eval.toSummaryString("\nResults\n", false));
training.arff
@relation '_Users_Passionate_Desktop_Training_Text-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
@attribute @@class@@ {dummy,ham,spam}
@attribute about numeric
@attribute and numeric
@attribute be numeric
@attribute construction numeric
@attribute discussion numeric
@attribute me numeric
@attribute much numeric
@attribute np numeric
@attribute of numeric
@attribute reminds numeric
@attribute s numeric
@attribute said numeric
@attribute second numeric
@attribute that numeric
@attribute the numeric
@attribute to numeric
@attribute very numeric
@attribute agree numeric
@attribute auto numeric
@attribute boxes numeric
@attribute credit numeric
@attribute dear numeric
@attribute display numeric
@attribute have numeric
@attribute ll numeric
@attribute nlpeople numeric
@attribute sure numeric
@attribute we numeric
@attribute with numeric
@attribute you numeric
@data
{0 ham,1 1,2 1,3 1,4 1,7 1,8 1,12 1,13 1,15 1,16 1,17 1}
{0 ham,5 1,6 1,8 1,9 1,10 1,11 1,14 1,15 1}
{0 spam,18 1,19 1,22 1,25 1,26 1,27 1,30 1}
{0 spam,20 1,21 1,23 1,24 1,28 1,29 1}
测试.arff
@relation '_Users_Passionate_Desktop_TestingDiffThanTesting_Text-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
@attribute @@class@@ {dummy,ham,spam}
@attribute brazilian numeric
@attribute do numeric
@attribute in numeric
@attribute indigenous numeric
@attribute languages numeric
@attribute linguists numeric
@attribute on numeric
@attribute prepare numeric
@attribute research numeric
@attribute specialization numeric
@attribute to numeric
@attribute and numeric
@attribute be numeric
@attribute developed numeric
@attribute evaluation numeric
@attribute of numeric
@attribute program numeric
@attribute published numeric
@attribute reanalysis numeric
@attribute the numeric
@attribute will numeric
@data
{0 ham,1 1,3 1,4 1,5 1,10 1}
{0 ham,2 1,4 1,6 1,7 1,8 1,9 1,11 1}
{0 spam,12 1,15 1,16 1,18 1,19 1}
{0 spam,5 1,13 1,14 1,17 1,20 1,21 1}
最佳答案
感谢 James 和 Michael,我能够解决“Src 和 Dest 的属性数量不同:”的问题。
第 1 步: 运行 TextDirectoryLoader 以训练和测试文件夹。
java weka.core.converters.TextDirectoryLoader -dir Testing_Text > testing.arff
第 2 步: 以下命令实际上在 training.arff 和 testing.arff 文件中给我相同数量的属性(http://weka.wikispaces.com/Batch+filtering)。我错误地单独运行 StringToWordVector 命令,这在 training.arff 和 testing.arff 文件中给了我不同数量的属性。
java weka.filters.unsupervised.attribute.StringToWordVector -b -i training.arff -o train_std.arff -r testing.arff -s test_std.arff
现在我面临另一个问题:
尽管如此,我在第 3 行给出了 setClassIndex,但它仍然显示错误消息“Class is not set!”。
错误:
weka.core.UnassignedClassException: Class is not set!
weka.core.UnassignedClassException: Class is not set!
at weka.core.Instance.setClassMissing(Instance.java:544)
at weka.classifiers.Evaluation.evaluateModelOnceAndRecordPrediction(Evaluation.java:1439)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1412)
at ClassifierJan12Pure.main(ClassifierJan12Pure.java:94)
Java 代码:
1. filteredData = new Instances(new BufferedReader(new FileReader("/Users/Passionate/Desktop/train_std.arff")));
2. Instances filteredTests= new Instances(new BufferedReader(new FileReader("/Users/Passionate/Desktop/test_std.arff")));
3. filteredData.setClassIndex(filteredData.attribute("@@class@@").index());
4. Classifier classifier=new SMO();
5. classifier.buildClassifier(filteredData);
6. FilteredClassifier filteredClassifier=new FilteredClassifier();
7. filteredClassifier.setClassifier(classifier);
8. Evaluation eval = new Evaluation(filteredData);
9. eval.evaluateModel(filteredClassifier, filteredTests); **// Error line.**
10. System.out.println(eval.toSummaryString("\nResults\n======\n", false));
train_std.arff 文件:
@relation '_Users_Passionate_Desktop_Training_Text-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
@attribute @@class@@ {dummy,ham,spam}
@attribute a numeric
@attribute address numeric
@attribute all numeric
@attribute as numeric
@attribute at numeric
@attribute back numeric
@attribute boxes numeric
@attribute credit numeric
@attribute display numeric
@attribute had numeric
@attribute happy numeric
@attribute have numeric
@attribute hoped numeric
@attribute is numeric
@attribute line numeric
@attribute message numeric
@attribute moderators numeric
@attribute new numeric
@attribute not numeric
@attribute of numeric
@attribute on numeric
@attribute our numeric
@attribute remember numeric
@attribute running numeric
@attribute subscribers numeric
@attribute the numeric
@attribute those numeric
@attribute to numeric
@attribute very numeric
@attribute we numeric
@attribute with numeric
@data
{0 ham,1 1,3 1,11 1,16 1,17 1,22 1,25 1,28 1,29 1}
{0 ham,4 1,6 1,10 1,13 1,14 1,15 1,19 1,21 1,26 1,30 1}
{0 spam,2 1,5 1,18 1,20 1,22 1,23 1,24 1,27 1}
{0 spam,7 1,8 1,9 1,12 1,30 1,31 1}
test_std.arff 文件:
@relation '_Users_Passionate_Desktop_Training_Text-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
@attribute @@class@@ {dummy,ham,spam}
@attribute a numeric
@attribute address numeric
@attribute all numeric
@attribute as numeric
@attribute at numeric
@attribute back numeric
@attribute boxes numeric
@attribute credit numeric
@attribute display numeric
@attribute had numeric
@attribute happy numeric
@attribute have numeric
@attribute hoped numeric
@attribute is numeric
@attribute line numeric
@attribute message numeric
@attribute moderators numeric
@attribute new numeric
@attribute not numeric
@attribute of numeric
@attribute on numeric
@attribute our numeric
@attribute remember numeric
@attribute running numeric
@attribute subscribers numeric
@attribute the numeric
@attribute those numeric
@attribute to numeric
@attribute very numeric
@attribute we numeric
@attribute with numeric
@data
{0 ham}
{0 ham,21 1,28 1}
{0 spam,20 1}
{0 spam,26 1}
关于java - Weka:Src 和 Dest 的属性数量不同:使用 java,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14189011/