scikit-learn RandomForestClassifier,停止工作,有关如何调试的建议

标签 scikit-learn

我正在 RandomForestClassifier 上进行网格搜索,我的代码一直在工作,直到我更改了功能,然后代码突然生成以下错误(在 classifier.fit 行)

我没有更改任何代码,只是将特征维度从 16 减少到 8。我完全不知道应该研究什么。这个错误是什么意思?

错误:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 344, in __call__
return self.func(*args, **kwargs)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py", line 120, in _parallel_build_trees
tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/home/zqz/Programs/anaconda3/lib/python3.5/site-packages/sklearn/tree/tree.py", line 246, in fit
raise ValueError("max_features must be in (0, n_features]")
ValueError: max_features must be in (0, n_features]

代码:

    classifier = RandomForestClassifier(n_estimators=20, n_jobs=-1)
    rfc_tuning_params = {"max_depth": [3, 5, None],
                         "max_features": [1, 3, 5, 7, 10],
                         "min_samples_split": [2, 5, 10],
                         "min_samples_leaf": [1, 3, 10],
                         "bootstrap": [True, False],
                         "criterion": ["gini", "entropy"]}
    classifier = GridSearchCV(classifier, param_grid=rfc_tuning_params, cv=nfold,
                              n_jobs=cpus)
    model_file = os.path.join(os.path.dirname(__file__), "random-forest_classifier-%s.m" % task)
    classifier.fit(X_train, y_train) #line that causes the error
    nfold_predictions=cross_val_predict(classifier.best_estimator_, X_train, y_train, cv=nfold)

最佳答案

在您的 rfc_tuning_params 中,您有 "max_features": [1, 3, 5, 7, 10]。其中包括 10 个,这比特征数量 (8) 还要多。因此你会得到错误

ValueError: max_features must be in (0, n_features]

因此您需要从 “max_features” 中删除 10。

关于scikit-learn RandomForestClassifier,停止工作,有关如何调试的建议,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43161093/

相关文章:

python - 生成 optuna 网格的函数提供了 sklearn 管道

python - 操作系统错误 : [Errno 12] Cannot allocate memory when using python multiprocessing Pool

python - 在 sklearn 的 .fit() 方法中使用 numpy.ndarray 与 Pandas Dataframe

python - Sklearn LogisticRegressionCV 的类似数组的输入

python - 在 sklearn 中使用决策树回归和交叉验证

python - 使用python在高斯过程回归中对训练数据集进行数据增强

python - 随机森林修剪

python - SKlearn 导入 MLPClassifier 失败

python - Google Cloud ML-engine scikit-learn 预测概率 'predict_proba()'

python - 为什么 BernoulliNBC 在 iris 数据集上的表现比 GaussianNBC 或 MultinomialNBC 差?