gbm - python中使用gridsearchcv进行梯度提升分类器的参数调整

标签 gbm gridsearchcv

我正在尝试运行 GradientBoostingClassifier()在 gridsearchcv 的帮助下。
对于每个参数组合,我还需要表格格式的“精确度”、“召回率”和准确性。

这是代码:


scoring= ['accuracy', 'precision','recall']
parameters = {#'nthread':[3,4], #when use hyperthread, xgboost may become slower
               "criterion": ["friedman_mse",  "mae"],
              "loss":["deviance","exponential"],
              "max_features":["log2","sqrt"],
              'learning_rate': [0.01,0.05,0.1,1,0.5], #so called `eta` value
              'max_depth': [3,4,5],
              'min_samples_leaf': [4,5,6],

              'subsample': [0.6,0.7,0.8],
              'n_estimators': [5,10,15,20],#number of trees, change it to 1000 for better results
              'scoring':scoring

              }

# sorted(sklearn.metrics.SCORERS.keys()) # To see different loss functions
#clf_xgb = GridSearchCV(xgb_model, parameters, n_jobs=5,verbose=2, refit=True,cv = 8)
clf_gbm = GridSearchCV(gbm_model, parameters, n_jobs=5,cv = 8)

clf_gbm.fit(X_train,y_train)


print(clf_gbm.best_params_)
print(clf_gbm.best_score_)

feature_importances = pd.DataFrame(clf_gbm.best_estimator_.feature_importances_,
                                   index = X_train.columns,
                                    columns=['importance']).sort_values('importance', ascending=False)
print(feature_importances)
depth=clf_gbm.cv_results_["param_max_depth"]
score=clf_gbm.cv_results_["mean_test_score"]
params=clf_gbm.cv_results_["params"]


我得到错误为:
      ValueError: Invalid parameter seed for estimator GradientBoostingClassifier(criterion='friedman_mse', init=None,
          learning_rate=0.01, loss='deviance', max_depth=3,
          max_features='log2', max_leaf_nodes=None,
          min_impurity_decrease=0.0, min_impurity_split=None,
          min_samples_leaf=4, min_samples_split=2,
          min_weight_fraction_leaf=0.0, n_estimators=5, presort='auto',
          random_state=None, subsample=1.0, verbose=0,
          warm_start=False). Check the list of available parameters with `estimator.get_params().keys()`.

最佳答案

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import make_scorer
#creating Scoring parameter: 
scoring = {'accuracy': make_scorer(accuracy_score),
           'precision': make_scorer(precision_score),'recall':make_scorer(recall_score)}

# A sample parameter

parameters = {
    "loss":["deviance"],
    "learning_rate": [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2],
    "min_samples_split": np.linspace(0.1, 0.5, 12),
    "min_samples_leaf": np.linspace(0.1, 0.5, 12),
    "max_depth":[3,5,8],
    "max_features":["log2","sqrt"],
    "criterion": ["friedman_mse",  "mae"],
    "subsample":[0.5, 0.618, 0.8, 0.85, 0.9, 0.95, 1.0],
    "n_estimators":[10]
    }
#passing the scoring function in the GridSearchCV
clf = GridSearchCV(GradientBoostingClassifier(), parameters,scoring=scoring,refit=False,cv=2, n_jobs=-1)

clf.fit(trainX, trainY)
#converting the clf.cv_results to dataframe
df=pd.DataFrame.from_dict(clf.cv_results_)
#here Possible inputs for cross validation is cv=2, there two split split0 and split1
df[['split0_test_accuracy','split1_test_accuracy','split0_test_precision','split1_test_precision','split0_test_recall','split1_test_recall']]


根据accuracy_score、precision_score或recall找到最佳参数,并在测试数据上重新拟合模型和预测

enter image description here

#find the best parameter based on the accuracy_score
#taking the average of the accuracy_score
df['accuracy_score']=(df['split0_test_accuracy']+df['split1_test_accuracy'])/2

df.loc[df['accuracy_score'].idxmax()]['params']

enter image description here

对测试数据的预测

clf =GradientBoostingClassifier(criterion='mae',
 learning_rate=0.1,
 loss='deviance',
 max_depth= 5,
 max_features='sqrt',
 min_samples_leaf= 0.1,
 min_samples_split= 0.42727272727272736,
 n_estimators=10,
 subsample=0.8)
clf.fit(trainX, trainY)
correct_test = correct_data(test)
testX = correct_test[predictor].values
result = clf.predict(testX)

关于gbm - python中使用gridsearchcv进行梯度提升分类器的参数调整,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58781601/

相关文章:

r - 如何重现H2o GBM类概率计算

machine-learning - 交叉验证中的平衡类

r - GBM 模型生成 NA 结果

python - GridSearchCV 的结果作为表格

r - Predict.gbm() 的预测不一致

r - 是否有可能在 R 中精简 GBM 模型?

python - GridSearchCV & RandomizedSearchCV - 运行后是否重新拟合模型

python - 在 knn crossval 网格搜索中定义距离参数 (V)(seuclidean/mahalanobis 距离度量)

python - 使用 GridsearchCV 提取管道中最佳模型的 MLPRegressor 属性 (n_iter_ )?