python - 如何使用热启动

我想使用 warm_start 参数将训练数据添加到我的随机森林分类器中。我希望它像这样使用:

clf = RandomForestClassifier(...)
clf.fit(get_data())
clf.fit(get_more_data(), warm_start=True)

但是warm_start参数是一个构造函数参数。那么我应该做这样的事情吗？

clf = RandomForestClassifier()
clf.fit(get_data())
clf = RandomForestClassifier (warm_start=True)
clf.fit(get_more_data)

这对我来说毫无意义。对构造函数的新调用不会丢弃以前的训练数据吗？我想我错过了什么。

最佳答案

基本模式(取自 Miriam 的回答):

clf = RandomForestClassifier(warm_start=True)
clf.fit(get_data())
clf.fit(get_more_data())

将是 API 方面的正确用法。

但是这里有一个问题。

正如文档所说:

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

这意味着，warm_start 唯一能为您做的就是添加新的 DecisionTree。之前所有的树似乎都没有动过!

让我们用 some sources 检查一下:

  n_more_estimators = self.n_estimators - len(self.estimators_)

    if n_more_estimators < 0:
        raise ValueError('n_estimators=%d must be larger or equal to '
                         'len(estimators_)=%d when warm_start==True'
                         % (self.n_estimators, len(self.estimators_)))

    elif n_more_estimators == 0:
        warn("Warm-start fitting without increasing n_estimators does not "
             "fit new trees.")

这基本上告诉我们，在接近新拟合之前，您需要增加估算器的数量!

我不知道 sklearn 在这里期望什么样的用法。我不确定，如果拟合、增加内部变量并再次拟合是正确的用法，但我以某种方式怀疑它(特别是因为 n_estimators 不是公共(public)类变量)。

你的基本方法(关于这个库和这个分类器)可能不是你在这里的核外学习的好主意!我不会进一步追求这一点。

关于python - 如何使用热启动，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42757892/

python - 如何使用热启动

上一篇：Python json.loads改变对象的顺序

下一篇：python - 如何安装 torchtext？