python - 有没有办法在 GridSearchCV 中获取所有计算的系数？

我正在尝试不同的 ML 模型，所有模型都使用包含转换器和算法的管道，“嵌套”在 GridSearchCV 中以找到最佳超参数。

当运行 Ridge、Lasso 和 ElasticNet 回归时，我想存储所有计算的系数，而不仅仅是 best_estimator_ 系数，以便根据 alpha 绘制它们> 的路径。换句话说，当 GridSearchCV 更改 alpha 参数并拟合新模型时，我想存储生成的系数，以根据 alpha 值绘制它们。

可以看看this一个漂亮的例子的官方 scikit 帖子。

这是我的代码:

from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error
import time
start = time.time()

# Cross-validated - Ridge Regression
model_ridge = make_pipeline(transformer, Ridge()) # my transformer is already defined 

alphas = np.logspace(-5, 5, num = 50)
params = {'ridge__alpha' : alphas}
    
grid = GridSearchCV(model_ridge, param_grid = params, cv=10)
grid.fit(X_train, y_train)
regressor = grid.estimator.named_steps['ridge'].coef_ # when I add this line, it returns an error
    
stop = time.time()
training_time = stop-start

y_pred = grid.predict(X_test)

Ridge_Regression_results = {'Algorithm' : 'Ridge Regression', 
                             'R²' : grid.score(X_train, y_train), 
                             'MAE' : mean_absolute_error(y_test, y_pred), 
                             'RMSE' : np.sqrt(mean_squared_error(y_test, y_pred)),
                             'Training time (sec)' : training_time}

在本主题中:return coefficients from Pipeline object in sklearn , 建议作者使用管道的 named_steps 属性。但就我而言，当我尝试使用它时，它返回以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18260/3310195105.py in <module>
     13 
     14 grid.fit(X_train, y_train)
---> 15 regressor = grid.estimator.named_steps['ridge'].coef_
     16 
     17 

AttributeError: 'Ridge' object has no attribute 'coef_'

我不明白为什么会这样。

为了让它起作用，我猜想这种存储应该发生在 GridSearchCV 循环期间，但我不知道该怎么做。

最佳答案

您可以通过将它们设为“分数”来获取系数，尽管这在语义上不是很正确。

import pandas as pd

def myscores(estimator, X, y):
    r2 = estimator.score(X, y)
    coefs = estimator.named_steps["ridge"].coef_ 
    ret_dict = {
        f'a_{i}': coef for i, coef in enumerate(coefs)
    }
    ret_dict['r2'] = r2
    return ret_dict

grid = GridSearchCV(
    model_ridge,
    param_grid=params,
    scoring=myscores,
    refit='r2'
)

print(pd.DataFrame(grid.cv_results_)

关于python - 有没有办法在 GridSearchCV 中获取所有计算的系数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73444161/

python - 有没有办法在 GridSearchCV 中获取所有计算的系数？

上一篇：c# - 您能否在 ARM 上的最新 Visual Studio 中构建 VSTO Excel 解决方案？

下一篇：python - 使用 PuLP 优化器实现具有额外弹性约束的装箱问题