python - Sklearn 管道 : Get feature names after OneHotEncode In ColumnTransformer

标签 python scikit-learn pipeline

我想在安装管道后获取特征名称。

categorical_features = ['brand', 'category_name', 'sub_category']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])
    
numeric_features = ['num1', 'num2', 'num3', 'num4']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

然后

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('regressor', GradientBoostingRegressor())])

用 pandas dataframe 拟合后,我可以从中获取特征重要性

clf.steps[1][1].feature_importances_

我尝试了 clf.steps[0][1].get_feature_names() 但我遇到了错误

AttributeError: Transformer num (type Pipeline) does not provide get_feature_names.

如何从中获取特征名称?

最佳答案

您可以使用以下代码段访问 feature_names:

clf.named_steps['preprocessor'].transformers_[1][1]\
   .named_steps['onehot'].get_feature_names(categorical_features)

使用 sklearn >= 0.21 版本,我们可以使它更简单:

clf['preprocessor'].transformers_[1][1]\
    ['onehot'].get_feature_names(categorical_features)

可重现的例子:

import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression

df = pd.DataFrame({'brand': ['aaaa', 'asdfasdf', 'sadfds', 'NaN'],
                   'category': ['asdf', 'asfa', 'asdfas', 'as'],
                   'num1': [1, 1, 0, 0],
                   'target': [0.2, 0.11, 1.34, 1.123]})

numeric_features = ['num1']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['brand', 'category']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('regressor',  LinearRegression())])
clf.fit(df.drop('target', 1), df['target'])

clf.named_steps['preprocessor'].transformers_[1][1]\
   .named_steps['onehot'].get_feature_names(categorical_features)

# ['brand_NaN' 'brand_aaaa' 'brand_asdfasdf' 'brand_sadfds' 'category_as'
#  'category_asdf' 'category_asdfas' 'category_asfa']

关于python - Sklearn 管道 : Get feature names after OneHotEncode In ColumnTransformer,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54646709/

相关文章:

python - 无法在Mac OSX上的python3.6中导入OpenCV,这是依赖性问题吗?

python - 如何在 python 的类方法中注入(inject)代码?

Python,发现一个列表没有特定的项目

python - 执行 StandardScaler 后将 NaN 分配给 -1

parallel-processing - 执行以下循环所需的循环数是多少?

python - sklearn pipeline - 如何对不同的列应用不同的转换

c - 我尝试在linux上用c语言学习pipe

python - 如何在 Odoo v11 中添加打印自定义报告按钮?

python - 对象相似性 Pandas 和 Scikit Learn

python - sklearn.linear_model.LogisticRegression 每次都返回不同的系数,尽管设置了 random_state