我想使用 XGBRegressor 来预测一些数据。所以我加载了训练数据和测试数据。
iowa_file_path = '../input/train.csv'
test_data_path = '../input/test.csv'
data = pd.read_csv(iowa_file_path)
test_data = pd.read_csv(test_data_path)
数据内容
测试数据的内容
然后我做一些数据清理
data.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = data.SalePrice
X = data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size =0.25)
my_imputer = SimpleImputer()
train_X = my_imputer.fit_transform(train_X)
val_X = my_imputer.transform(val_X)
my_model = XGBRegressor(n_estimators=100, learning_rate=0.1)
my_model.fit(train_X, train_y, early_stopping_rounds=None,
eval_set=[(val_X, val_y)], verbose=False)
test_data_process = test_data.select_dtypes(exclude=['object'])
predictions = my_model.predict(test_data_process)
但是我在运行predict
函数时收到以下错误消息:
ValueError Traceback (most recent call last) in () 1 test_data_process = test_data.select_dtypes(exclude=['object']) ----> 2 predictions = my_model.predict(test_data_process)
/opt/conda/lib/python3.6/site-packages/xgboost-0.80-py3.6.egg/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features) 395 output_margin=output_margin, 396 ntree_limit=ntree_limit, --> 397 validate_features=validate_features) 398 399 def apply(self, X, ntree_limit=0):
/opt/conda/lib/python3.6/site-packages/xgboost-0.80-py3.6.egg/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features) 1206 1207 if validate_features: -> 1208 self._validate_features(data) 1209 1210 length = c_bst_ulong()
/opt/conda/lib/python3.6/site-packages/xgboost-0.80-py3.6.egg/xgboost/core.py in _validate_features(self, data) 1508 1509 raise ValueError(msg.format(self.feature_names, -> 1510 data.feature_names)) 1511 1512 def get_split_value_histogram(self, feature, fmap='', bins=None, as_pandas=True):
ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36'] ['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold'] expected f9, f6, f14, f27, f18, f7, f8, f23, f17, f22, f35, f0, f28, f29, f20, f31, f36, f25, f11, f21, f12, f24, f34, f10, f5, f32, f15, f26, f30, f1, f2, f16, f19, f3, f4, f33, f13 in input data training data did not have the following fields: BsmtUnfSF, 1stFlrSF, LowQualFinSF, MSSubClass, WoodDeckSF, GrLivArea, MiscVal, YearBuilt, BsmtFinSF1, Fireplaces, MoSold, BsmtHalfBath, GarageYrBlt, FullBath, PoolArea, YrSold, HalfBath, 2ndFlrSF, KitchenAbvGr, OverallQual, Id, EnclosedPorch, ScreenPorch, GarageArea, BsmtFullBath, MasVnrArea, TotRmsAbvGrd, OverallCond, BedroomAbvGr, GarageCars, OpenPorchSF, YearRemodAdd, TotalBsmtSF, BsmtFinSF2, LotFrontage, 3SsnPorch, LotArea
它提示特征不匹配并且我在训练数据中没有这些字段。但是当我检查 data
的内容时,它有那些列。如何解决?
最佳答案
只是为了结束这个问题:
问题在于 SimpleImputer
用于训练和验证数据,但没有用于测试数据。
可在此处找到有关导致此类错误的原因的讨论:https://github.com/dmlc/xgboost/issues/2334#issuecomment-333195491
关于python - 为什么 XGBRegressor 预测警告特征不匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52398578/