python - 关于 Scikit-Learn 提前停止的问题

标签 python machine-learning scikit-learn mlp

我对Scikit-Learn MLPRegressor有一些疑问当启用提前停止时:

验证数据(参见“validation_fraction”)是随机选择的，位于所提供的测试数据的前面还是后面？
在训练的连续迭代过程中验证数据是否相同或不同？
在训练的最后阶段会自动包含/重新调整验证数据吗？
当验证分数在 n_iter_no_change 连续 epoch 内没有提高至少 tol 时，会返回之前的最佳回归量，还是 fit() 函数会简单地返回最后一个回归量？

最佳答案

Is the validation data (see 'validation_fraction') randomly selected, at the front, or at the back of the test data supplied?

MLPRegressor 在内部使用 train_test_split 创建验证数据。如果 MLPRegressor 的 shuffle 参数设置为 false，则从测试数据的末尾获取分数。如果 shuffle 设置为 true，则随机选择数据。

Is the validation data the same or different during successive iterations of the training?

所有训练迭代的验证数据都是相同的

Will the validation data automatically be included/refit during the final stage of the training?

验证数据永远不会用于训练模型。它仅用于对模型进行评分。

When the validation score is not improving by at least tol for n_iter_no_change consecutive epochs, will the previous best regressor be returned, or will the fit() function simply return the last regressor?

如果验证分数没有提高，提前停止将停止训练模型(避免过度拟合)，而不是继续，并返回模型最佳参数( link )

关于python - 关于 Scikit-Learn 提前停止的问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56559360/

上一篇：machine-learning - 如何在fastai学习器上进行grid_search？

下一篇：machine-learning - 将 GridSearchCV 与 TimeSeriesSplit 结合使用

相关文章：

python-3.x - Python FFT 用于特征提取

machine-learning - 构建 ROC 曲线涉及哪些变量？

machine-learning - keras中fit_generator()的优点

python - 纠正字典中的值

python - Python 中真正的非阻塞 HTTPS 服务器

python - 在 scikit-learn 中将 RandomizedSearchCV(或 GridSearcCV)与 LeaveOneGroupOut 交叉验证相结合

python - float() 参数必须是字符串或数字，而不是 'Timestamp'

python - 借助 AWS SageMaker，是否可以使用 sagemaker 开发工具包部署预训练模型？

python - 如何将 int numpy 数组的列乘以 float 数字并保持为 int？

python - Django 1.7 的重构可调用