python - 我无法使用随机森林输出最佳选择的特征？

我制作了随机森林分类器，其阈值 = 0.15，但是当我尝试迭代所选模型时，它不会输出最佳选择的特征。

代码:

X = data.loc[:,'IFATHER':'VEREP']
y = data.loc[:,'Criminal']

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import accuracy_score

    # Split the data into 30% test and 70% training
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

clf = RandomForestClassifier(n_estimators=100, random_state=0)

# Train the classifier
clf.fit(X_train, y_train)

# Print the name and gini importance of each feature
for feature in zip(X, clf.feature_importances_):
    print(feature)

# Create a selector object that will use the random forest classifier to identify
# features that have an importance of more than 0.15
sfm = SelectFromModel(clf, threshold=0.15)

# Train the selector
sfm.fit(X_train, y_train)

下面的代码不起作用:

# Print the names of the most important features
for feature_list_index in sfm.get_support(indices=True):
    print(X[feature_list_index])

我能够使用随机森林分类器而不是使用阈值来获取每个特征的特征重要性。我认为 get_support() 不是正确的方法。

屏幕截图:

最佳答案

创建包含最重要特征的新 X 数据集:

X_selected_features = sfm.fit_transform(X_train, y_train)

要查看功能名称:

features = np.array(list_of_feature_names)
print(features[sfm.get_support()])

如果X是Pandas.DataFrame:

features = X.columns.values

关于python - 我无法使用随机森林输出最佳选择的特征？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49339350/

python - 我无法使用随机森林输出最佳选择的特征？

上一篇：Python Flask 分页仅提交最后一页

下一篇：python - 训练 CNN 时 GPU 使用率低