Python sklearn预测函数

标签 python scikit-learn classification predict

我有一个问题,我尝试构建自己的分类器,它已经完成并且工作完美,但是当我尝试使用交叉验证分数时,我收到了错误:

  File "/home/webinterpret/workspace/nlp/wi-item-attribute-extraction/attr_extractor.py", line 95, in fit
    print self.fitted_models[attr][len(self.fitted_models[attr]) - 1].cross_validation_score(x_train, y_train, 5, 0.2)
  File "/home/webinterpret/workspace/nlp/wi-item-attribute-extraction/attr_extractor.py", line 163, in cross_validation_score
    cv=self.cv).mean()
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1361, in cross_val_score
    for train, test in cv)
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 659, in __call__
    self.dispatch(function, args, kwargs)
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 406, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 140, in __init__
    self.results = func(*args, **kwargs)
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1478, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1534, in _score
    score = scorer(estimator, X_test, y_test)
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/sklearn/metrics/scorer.py", line 201, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "/home/webinterpret/workspace/nlp/wi-item-attribute-extraction/attr_extractor.py", line 198, in score
    return (pd.Series(self.predict(x_test)) == y_test).mean()
  File "/home/webinterpret/workspace/nlp/wi-item-attribute-extraction/attr_extractor.py", line 190, in predict
    result[i] = 1 if self.pattern in item else 0
  File "/home/webinterpret/.virtualenvs/nlp/local/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 216, in __eq__
    if np.isnan(other):
TypeError: Not implemented for this type

我的预测函数:

result = np.zeros(text.shape[0])
i = 0
for item in text:
    result[i] = 1 if self.pattern in item else 0
    i+=1
return result

错误出现在“if self.pattern in item else 0”中,但我不知道如何以不同的方式实现它?

模式是一个文本,例如:“汽车”,文本只是一个文本:“这辆车坏了。”

最佳答案

因此,scikit-learn 确实希望您的数据采用严格的矩阵形式。 x_train 应该是数字矩阵,y_train 应该是数字矩阵或向量。交叉验证例程对您的输入进行数组化,以确保其格式适合内置分类器。

这里,数组化步骤正在创建一个字符矩阵(有效地),该矩阵的列数与最大长度文本的列数一样多。因此,大多数文本行都用“np.nans”填充剩余的列。

如果您想像这样使用分类器,则需要避免内置的管道和交叉验证例程。您可以迭代交叉验证并构建自己的分数,如下所示:

for train,test in StratifiedKFold( target_classes ):
    train_data = data[train]
    test_data = data[test]
    # Train with train, predict with test, score with your favorite scorer

关于Python sklearn预测函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31825874/

相关文章:

python - for 循环中的lookbehind

python - 批处理 : GNU-make, Snakemake 还是什么?

python - Numpy 构建失败,无法导入多数组

python - '__getnewargs__' 在此代码中做什么

java - TensorFlow 标签号与轴上的形状不匹配

python - 如何使用现有的和较新的类微调 keras 模型?

Python:当我导入 RandomForestClassifier 时出现 "TypeError: Could not operate with block values"

python - 高效地将 numpy 数组转换为矩阵

pandas - 如何打印 Sklearn 中 GridSearch 中使用的召回率和准确率以及参数?

python - Keras - 精度和召回率大于 1(多分类)