python sklearn cross_validation/标签数量与样本数量不匹配

标签 python scikit-learn cross-validation

正在学习机器学习类(class),我想将数据分为训练集和测试集。我想将其拆分,使用决策树对其进行训练,然后打印出我的测试集的分数。我的代码中的交叉验证参数已给出。有人看到我做错了什么吗?

我得到的错误如下:

Traceback (most recent call last):
  File "/home/stephan/ud120-projects/validation/validate_poi.py", line 36, in <module>
    clf = clf.fit(features_train, labels_train)
  File "/home/stephan/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 221, in fit
    "number of samples=%d" % (len(y), n_samples))
ValueError: Number of labels=29 does not match number of samples=66

这是我的代码:

import pickle
import sys
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit

data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )

features_list = ["poi", "salary"]

data = featureFormat(data_dict, features_list)
labels, features = targetFeatureSplit(data)

from sklearn import tree
from sklearn import cross_validation

features_train, labels_train, features_test, labels_test = \
    cross_validation.train_test_split(features, labels, random_state=42, test_size=0.3)



clf = tree.DecisionTreeClassifier()
clf = clf.fit(features_train, labels_train)
print clf.score(features_test, labels_test)

最佳答案

您的变量似乎与 train_test_split 的返回模式不匹配

尝试:

features_train, features_test, labels_train, labels_test = ...

关于python sklearn cross_validation/标签数量与样本数量不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30958433/

相关文章:

python - 如何在学习曲线图中形成平坦的验证准确度曲线

python - 如何在cdef中等待?

python - 打印具有连续名称的变量的问题

python - 如何将 csv 或 arff 导入到 scikit?

python - 随机森林的表现比其他方法好得多

R glm - 如何进行多重交叉验证

python - 使用 opencv 和 python 进行多色跟踪

python - 对列表列表中的所有元素求和,除了第一个

python - SVM 文本分类错误

python - Scikit F 分数度量误差