python - “Booster”对象没有属性 'score' - 准确性

标签 python machine-learning xgboost

在寻找 xgboost 的最佳参数时,我遇到了一个问题。

整个过程进行得很顺利,我设法将参数附加到模型并检查其准确性,但我的解决方案非常原始并且不太好( “手动”将参数附加到先前创建的模型)

当我尝试检查模型的准确性时,出现以下错误:

AttributeError: 'Booster' object has no attribute 'score'

准确度:

accuracy = classifier.score(X_test, y_test)
print(accuracy*100,'%')

我把所有代码放在下面(都是因为我不知道错误到底发生在哪里):

# Fitting XGBoost to the Training set
from xgboost import XGBClassifier
classifier = XGBClassifier()

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# here the accuracy is checked without any problem
accuracy = classifier.score(X_test, y_test)
print(accuracy*100,'%')

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    # Parameters that we are going to tune.
    'max_depth':6,
    'min_child_weight': 1,
    'eta':.3,
    'lambda': .1,
    'subsample': 1,
    'colsample_bytree': 1,
     # Other parameters
    'objective':'reg:squarederror',
}

params['eval_metric'] = "rmse"

num_boost_round = 999

model = xgb.train(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    evals=[(dtest, "Test")],
    early_stopping_rounds=10
)

print("Best RMSE: {:.2f} with {} rounds".format(
                 model.best_score,
                 model.best_iteration+1))

cv_results = xgb.cv ( 
    params, 
    dtrain, 
    num_boost_round = num_boost_round, 
    seed = 42, 
    nfold = 5, 
    metrics = {'rmse'}, 
    early_stopping_rounds = 10 
) 
cv_results

cv_results ['test-rmse-mean']. min () 

gridsearch_params = [
    (max_depth, min_child_weight)
    for max_depth in range(9,12)
    for min_child_weight in range(5,8)
]

min_rmse = float("Inf")
best_params = None
for max_depth, min_child_weight in gridsearch_params:
    print("CV with max_depth={}, min_child_weight={}".format(
                             max_depth,
                             min_child_weight))
    # Update our parameters
    params['max_depth'] = max_depth
    params['min_child_weight'] = min_child_weight
    # Run CV
    cv_results = xgb.cv(
        params,
        dtrain,
        num_boost_round=num_boost_round,
        seed=42,
        nfold=5,
        metrics={'rmse'},
        early_stopping_rounds=10
    )
    # Update best RMSE
    mean_rmse = cv_results['test-rmse-mean'].min()
    boost_rounds = cv_results['test-rmse-mean'].argmin()
    print("\RMSE {} for {} rounds".format(mean_rmse, boost_rounds))
    if mean_rmse < min_rmse:
        min_rmse = mean_rmse
        best_params = (max_depth,min_child_weight)
print("Best params: {}, {}, RMSE: {}".format(best_params[0], best_params[1], min_rmse))

params['max_depth'] = 9
params['min_child_weight'] = 7

gridsearch_params = [
    (subsample, colsample)
    for subsample in [i/10. for i in range(7,11)]
    for colsample in [i/10. for i in range(7,11)]
]

min_rmse = float("Inf")
best_params = None
# We start by the largest values and go down to the smallest
for subsample, colsample in reversed(gridsearch_params):
    print("CV with subsample={}, colsample={}".format(
                             subsample,
                             colsample))
    # We update our parameters
    params['subsample'] = subsample
    params['colsample_bytree'] = colsample
    # Run CV
    cv_results = xgb.cv(
        params,
        dtrain,
        num_boost_round=num_boost_round,
        seed=42,
        nfold=5,
        metrics={'rmse'},
        early_stopping_rounds=10
    )
    # Update best score
    mean_rmse = cv_results['test-rmse-mean'].min()
    boost_rounds = cv_results['test-rmse-mean'].argmin()
    print("\tRMSE {} for {} rounds".format(mean_rmse, boost_rounds))
    if mean_rmse < min_rmse:
        min_rmse = mean_rmse
        best_params = (subsample,colsample)
print("Best params: {}, {}, RMSE: {}".format(best_params[0], best_params[1], min_rmse))

params['subsample'] = 1.0
params['colsample_bytree'] = 1.0

%time
# This can take some time…
min_rmse = float("Inf")
best_params = None
for eta in [.3, .2, .1, .05, .01, .005]:
    print("CV with eta={}".format(eta))
    # We update our parameters
    params['eta'] = eta
    # Run and time CV

    %time cv_results = xgb.cv(\
            params,\
            dtrain,\
            num_boost_round=num_boost_round,\
            seed=42,\
            nfold=5,\
            metrics=['rmse'],\
            early_stopping_rounds=10\
          )

    # Update best score
    mean_rmse = cv_results['test-rmse-mean'].min()
    boost_rounds = cv_results['test-rmse-mean'].argmin()
    print("\tRMSE {} for {} rounds\n".format(mean_rmse, boost_rounds))
    if mean_rmse < min_rmse:
        min_rmse = mean_rmse
        best_params = eta

print("Best params: {}, RMSE: {}".format(best_params, min_rmse))

params['eta'] = .2

classifier = xgb.train(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    evals=[(dtest, "Test")],
    early_stopping_rounds=10
)

num_boost_round = model.best_iteration + 1
best_model = xgb.train(
    params,
    dtrain,
    num_boost_round=num_boost_round,
    evals=[(dtest, "Test")]
)

from sklearn.metrics import mean_absolute_error
mean_absolute_error(best_model.predict(dtest), y_test)

best_model.save_model("my_model.model")

loaded_model = xgb.Booster()
loaded_model.load_model("my_model.model")

accuracy = classifier.score(X_test, y_test)
print(accuracy*100,'%')

第二次尝试检查准确性时,出现错误。

最佳答案

您的classifier对象是一个对象类型Booster,它不包含方法score

您可以使用predict方法获取预测并使用sklearn.metrics计算您的分数

关于python - “Booster”对象没有属性 'score' - 准确性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57801762/

相关文章:

hadoop - 在 Google Cloud Dataproc 上运行 xgboost

machine-learning - 每个样本都有独特的真/假损失

python - 使用 pytesser 识别简单数字

python-3.x - 两个Python循环看起来应该做同样的事情,但输出不同的结果?

python - 连接转换网络的两个输出

python - 在前 X 个最重要的特征上运行 XGBoost 与使用变换方法之间的区别

python - 按特定顺序组合数据框中的两列

python - 使用 python/pandas 将数据框写入谷歌表格

python - 如何使用 flask SQLAlchemy 将文件保存在数据库和目录中?

machine-learning - 我如何知道训练数据足以用于机器学习