我想校准已经训练过的 xgboost 模型。根据文档:
If “prefit” is passed, it is assumed that base_estimator has been fitted already and all data is used for calibration.
所以我尝试按如下方式使用它:
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.calibration import CalibratedClassifierCV
X, y = make_classification()
X = pd.DataFrame(X)
X.columns = ['var' + str(i) for i in range(1, 21)]
y = pd.Series(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = XGBClassifier()
model.fit(X_train, y_train)
calibrated = CalibratedClassifierCV(model, method='isotonic', cv='prefit')
calibrated.fit(X_test, y_test)
不幸的是,这导致了以下错误:
ValueError: feature_names mismatch: ['var1', 'var2', 'var3', 'var4', 'var5', 'var6', 'var7', 'var8', 'var9', 'var10', 'var11', 'var12', 'var13', 'var14', 'var15', 'var16', 'var17', 'var18', 'var19', 'var20'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19'] expected var12, var10, var3, var1, var20, var15, var2, var9, var16, var7, var17, var11, var8, var5, var13, var4, var14, var6, var19, var18 in input data training data did not have the following fields: f2, f5, f16, f17, f13, f11, f18, f6, f9, f1, f12, f10, f19, f15, f14, f3, f7, f0, f4, f8
我相信这可能是由于功能以默认名称 f1
、f2
等存储在 xgboost 对象中。因此,我尝试重命名 X_test
列使用 X_test.rename(lambda x: x.replace('var', 'f'), axis = 1)
,但它不能解决问题。所以我的问题是:如何修复此错误并在经过训练的 xgboost
模型上使用 CaliberatedClassifierCV
?
最佳答案
Pandas 导致了这个问题。您将列名称传递给 sklearn 模型,这是错误的。
使用X_train, X_test, y_train, y_test = train_test_split(X.values, y.values)
,一切都会正常工作。
您需要将 numpy
数组传递到任何 sklearn
函数中以获得完全兼容性。
完整代码:
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.calibration import CalibratedClassifierCV
X, y = make_classification()
X = pd.DataFrame(X)
X.columns = ['var' + str(i) for i in range(1, 21)]
y = pd.Series(y)
X_train, X_test, y_train, y_test = train_test_split(X.values, y.values)
model = XGBClassifier()
model.fit(X_train, y_train)
calibrated = CalibratedClassifierCV(model, method='isotonic', cv='prefit')
calibrated.fit(X_test, y_test)
关于python - 如何在已经训练好的 xgboost 模型上使用 CalibrateClassifierCV?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61508945/