python - 拟合模型上的评分方法与 scikit-learn 中的 precision_score 之间有什么区别？

我通常只是将其发布到 Stack Overflow，但我思考了一下并意识到这实际上不是一个编码问题 - 这是一个机器学习问题。

对代码或其他任何内容的任何其他反馈都非常感谢和欢迎!

The Jupyter Notebook

所以我正在 Kaggle 上解决这个巨大的问题。我已准备好四个数据集:

features_train
功能测试
目标列车
目标测试

考虑到这一点，我有两个问题，尽管第二个问题是重要的。

问题1:我对下一步的理解正确吗？

我们将模型拟合到训练数据上，然后创建一个预测 (pred)，尝试根据我们的 features_test 数据进行预测。这意味着我们的 pred 和 target_test 数据集理论上应该是相同的(如果模型运行良好)。

这意味着，为了证明模型的准确性，我们可以简单地比较 pred 和 target_test 之间的结果，这就是 accuracy_score em> 函数来自 Sklearn。

问题2:使用模型的score方法与accuracy_score函数有什么区别？

这就是让我困惑的地方。您可以在单元格 97 中看到我使用的“模型 1”标题下的第一个单元格:

clf.score(features_test, target_test)

结果是

0.8609865470852018

但是，后来我也使用:

from sklearn.metrics import accuracy_score
print(accuracy_score(target_test, pred))

这也会导致

0.8609865470852018

这两个分数怎么一样？我做错了什么吗？或者这两个步骤基本上都在做同样的事情？如何..？ score() 属性是否有效地创建了 pred 数据帧并在后台对其进行检查？

最佳答案

对于此类问题，可以说您最好的 friend 是文档；引用自 scikit-learn 文档 model evaluation :

There are 3 different APIs for evaluating the quality of a model’s predictions:

Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they are designed to solve. This is not discussed on this page, but in each estimator’s documentation.

Scoring parameter: Model-evaluation tools using cross-validation (such as model_selection.cross_val_score and model_selection.GridSearchCV) rely on an internal scoring strategy. This is discussed in the section The scoring parameter: defining model evaluation rules.

Metric functions: The metrics module implements functions assessing prediction error for specific purposes. These metrics are detailed in sections on Classification metrics, Multilabel ranking metrics, Regression metrics and Clustering metrics.

在您的代码中使用的所有 3 个分类器( logistic regression 、 random forest 和 decision tree )的文档中，有相同的描述:

score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.

这回答了您关于所使用的特定型号的第二个问题。

尽管如此，在盲目信任估算器附带的 score 方法之前，您应该始终检查文档；在 linear regression和 desision tree regressor ，例如，score 返回决定系数 R^2，ML 从业者在构建预测模型时实际上从未使用过它(统计学家经常使用它来构建解释模型，但那是另一个故事)。

顺便说一句，我简要地浏览了您链接到的代码，我看到您计算了 MSE、MAE 和 RMSE 等指标 - 请记住，这些是回归指标，而它们不是在分类设置中有意义，例如您在这里面临的分类设置(反过来，准确性在回归设置中毫无意义)...

关于python - 拟合模型上的评分方法与 scikit-learn 中的 precision_score 之间有什么区别？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54168780/

python - 拟合模型上的评分方法与 scikit-learn 中的 precision_score 之间有什么区别？

上一篇：python - 当我将 numpy 数组转换为 Dataframe 时，它将值更新为 Nan

下一篇：python - 概率 SVM、回归

python - 拟合模型上的评分方法与 scikit-learn 中的 precision_score 之间有什么区别？

上一篇：python - 当我将 numpy 数组转换为 Dataframe 时，它​​将值更新为 Nan

下一篇：python - 概率 SVM、回归

上一篇：python - 当我将 numpy 数组转换为 Dataframe 时，它将值更新为 Nan