python - 为什么 XGBRegressor 预测警告特征不匹配?

标签 python nan xgboost imputation

我想使用 XGBRegressor 来预测一些数据。所以我加载了训练数据和测试数据。

iowa_file_path = '../input/train.csv'
test_data_path = '../input/test.csv'

data = pd.read_csv(iowa_file_path)
test_data = pd.read_csv(test_data_path)

数据内容

enter image description here

测试数据的内容

enter image description here

然后我做一些数据清理

data.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = data.SalePrice
X = data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])

train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size =0.25)
my_imputer = SimpleImputer()
train_X = my_imputer.fit_transform(train_X)
val_X = my_imputer.transform(val_X)

my_model = XGBRegressor(n_estimators=100, learning_rate=0.1)
my_model.fit(train_X, train_y, early_stopping_rounds=None, 
    eval_set=[(val_X, val_y)], verbose=False)

test_data_process = test_data.select_dtypes(exclude=['object'])
predictions = my_model.predict(test_data_process)

但是我在运行predict 函数时收到以下错误消息:


ValueError Traceback (most recent call last) in () 1 test_data_process = test_data.select_dtypes(exclude=['object']) ----> 2 predictions = my_model.predict(test_data_process)

/opt/conda/lib/python3.6/site-packages/xgboost-0.80-py3.6.egg/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features) 395 output_margin=output_margin, 396 ntree_limit=ntree_limit, --> 397 validate_features=validate_features) 398 399 def apply(self, X, ntree_limit=0):

/opt/conda/lib/python3.6/site-packages/xgboost-0.80-py3.6.egg/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features) 1206 1207 if validate_features: -> 1208 self._validate_features(data) 1209 1210 length = c_bst_ulong()

/opt/conda/lib/python3.6/site-packages/xgboost-0.80-py3.6.egg/xgboost/core.py in _validate_features(self, data) 1508 1509 raise ValueError(msg.format(self.feature_names, -> 1510 data.feature_names)) 1511 1512 def get_split_value_histogram(self, feature, fmap='', bins=None, as_pandas=True):

ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36'] ['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold'] expected f9, f6, f14, f27, f18, f7, f8, f23, f17, f22, f35, f0, f28, f29, f20, f31, f36, f25, f11, f21, f12, f24, f34, f10, f5, f32, f15, f26, f30, f1, f2, f16, f19, f3, f4, f33, f13 in input data training data did not have the following fields: BsmtUnfSF, 1stFlrSF, LowQualFinSF, MSSubClass, WoodDeckSF, GrLivArea, MiscVal, YearBuilt, BsmtFinSF1, Fireplaces, MoSold, BsmtHalfBath, GarageYrBlt, FullBath, PoolArea, YrSold, HalfBath, 2ndFlrSF, KitchenAbvGr, OverallQual, Id, EnclosedPorch, ScreenPorch, GarageArea, BsmtFullBath, MasVnrArea, TotRmsAbvGrd, OverallCond, BedroomAbvGr, GarageCars, OpenPorchSF, YearRemodAdd, TotalBsmtSF, BsmtFinSF2, LotFrontage, 3SsnPorch, LotArea

它提示特征不匹配并且我在训练数据中没有这些字段。但是当我检查 data 的内容时,它有那些列。如何解决?

最佳答案

只是为了结束这个问题:

问题在于 SimpleImputer 用于训练和验证数据,但没有用于测试数据。

可在此处找到有关导致此类错误的原因的讨论:https://github.com/dmlc/xgboost/issues/2334#issuecomment-333195491

关于python - 为什么 XGBRegressor 预测警告特征不匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52398578/

相关文章:

python - IBM Watson Speech-to-Text Python, 'DetailedResponse' 对象没有属性 'getResult'

c++ - 在 Mac 上安装 xgboost 失败 - ar : no archive members specified

R XGBoost - xgb.save 或 xgb.load 数据丢失

python - 在其他元素中受 nan 影响的张量元素上的 PyTorch 向后()

python - 使用 xgboost 绘制特征重要性

python - 如何在Windows上安装Python优化包pyOpt

python - 如何在django中使用html从context_data解析json文件

Python:基于键值长度的OrderedDictionary排序

python - 在Python中用0替换NaN

C++ 基础 - If 语句测试