machine-learning - 通过 StackingCVClassifier 问题堆叠分类器(sklearn 和 keras 模型)

标签 machine-learning keras scikit-learn deep-learning mlxtend

我对使用 mlxtend 包和 Keras 包有点陌生,所以请多多包涵。我一直在尝试结合各种模型的预测,即 Random ForestLogistic RegressionNeural Network 模型,使用 StackingCV 分类器。我正在尝试堆叠这些在不同特征子集上运行的分类器。请看代码如下。

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow import keras
from keras import layers
from keras.constraints import maxnorm
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation,  Flatten, Input
from mlxtend.classifier import StackingCVClassifier
from mlxtend.feature_selection import ColumnSelector
from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.neural_network import MLPClassifier

X, y = make_classification()
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)

# defining neural network model
def create_model ():
    # create model
    model = Sequential()
    model.add(Dense(10, input_dim=10, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Flatten())
    optimizer= keras.optimizers.RMSprop(lr=0.001)
    model.add(Dense(units = 1, activation = 'sigmoid'))  # Compile model
    model.compile(loss='binary_crossentropy',
                  optimizer=optimizer, metrics=[keras.metrics.AUC(), 'accuracy'])
    return model

# using KerasClassifier on the neural network model
NN_clf=KerasClassifier(build_fn=create_model, epochs=5, batch_size= 5)
NN_clf._estimator_type = "classifier"

# stacking of classifiers that operate on different feature subsets
pipeline1 = make_pipeline(ColumnSelector(cols=(np.arange(0, 5, 1))), LogisticRegression())
pipeline2 = make_pipeline(ColumnSelector(cols=(np.arange(5, 10, 1))), RandomForestClassifier())
pipeline3 = make_pipeline(ColumnSelector(cols=(np.arange(10, 20, 1))), NN_clf)

# final stacking
clf = StackingCVClassifier(classifiers=[pipeline1, pipeline2, pipeline3], meta_classifier=MLPClassifier())
clf.fit(X_train, y_train)

print("Stacking model score: %.3f" % clf.score(X_val, y_val))

但是,我收到了这个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-11-ef342536824f> in <module>
     42 # final stacking
     43 clf = StackingCVClassifier(classifiers=[pipeline1, pipeline2, pipeline3], meta_classifier=MLPClassifier())
---> 44 clf.fit(X_train, y_train)
     45 
     46 print("Stacking model score: %.3f" % clf.score(X_val, y_val))

~\anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py in fit(self, X, y, groups, sample_weight)
    282                 meta_features = prediction
    283             else:
--> 284                 meta_features = np.column_stack((meta_features, prediction))
    285 
    286         if self.store_train_meta_features:

~\anaconda3\lib\site-packages\numpy\core\overrides.py in column_stack(*args, **kwargs)

~\anaconda3\lib\site-packages\numpy\lib\shape_base.py in column_stack(tup)
    654             arr = array(arr, copy=False, subok=True, ndmin=2).T
    655         arrays.append(arr)
--> 656     return _nx.concatenate(arrays, 1)
    657 
    658 

~\anaconda3\lib\site-packages\numpy\core\overrides.py in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)

请帮帮我。谢谢!

最佳答案

发生错误是因为您将传统 ML 模型和 DL 模型的预测结合起来。

ML 模型给出的预测形状像这样 (80,1) 而 DL 模型预测的形状像这样 (80,1,1),所以尝试附加所有预测时出现不匹配。

常见的解决方法是去除 DL 方法给出的预测的额外维度,使其成为 (80,1) 而不是 (80,1,1)

所以,打开位于里面的py文件: anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py

在第 280 行和第 356 行的 if block 之外,添加以下内容:

prediction = prediction.squeeze(axis=1) if len(prediction.shape)>2 else prediction

所以,它看起来像这样:

...
...
...
if not self.use_probas:
    prediction = prediction[:, np.newaxis]
elif self.drop_proba_col == "last":
    prediction = prediction[:, :-1]
elif self.drop_proba_col == "first":
    prediction = prediction[:, 1:]
prediction = prediction.squeeze(axis=1) if len(prediction.shape)>2 else prediction

if meta_features is None:
    meta_features = prediction
else:
    meta_features = np.column_stack((meta_features, prediction))
...
...
...

for model in self.clfs_:
    if not self.use_probas:
        prediction = model.predict(X)[:, np.newaxis]
    else:
        if self.drop_proba_col == "last":
            prediction = model.predict_proba(X)[:, :-1]
        elif self.drop_proba_col == "first":
            prediction = model.predict_proba(X)[:, 1:]
        else:
            prediction = model.predict_proba(X)
    prediction = prediction.squeeze(axis=1) if len(prediction.shape)>2 else prediction
    per_model_preds.append(prediction)
...
...
...

关于machine-learning - 通过 StackingCVClassifier 问题堆叠分类器(sklearn 和 keras 模型),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74953148/

相关文章:

python - 有没有办法在 python 中使用 TF-IDF 找到句子的权重

haskell - OCaml 或 Haskell 中的机器学习?

python - 使用CNN和两个输入进行预测

python - DeepLab v3-如何处理任何大小比例的图像。 (h * w)

python - Numpy数组转换错误

python - "Invalid shape for y"用于 Keras LSTM w/return_sequences=True(和 sklearn API)

machine-learning - 如果没有正/负含义,如何计算召回率和精度?

c# - 在 C# 中加载用 python 构建的 keras 模型?

python - Keras提示错误: (Error when checking input: expected conv2d_4_input to have 4 dimensions)

python - 在 Python 脚本中将 freeze_support() 放在哪里?