python - 如果我们使用管道,我怎样才能得到树解释器的树贡献?

标签 python numpy scikit-learn random-forest

我正在使用 sklearns 的管道 功能,一次热编码,以及模型。几乎与 this 中的一样发布。

使用Pipeline 后,我无法再获得树贡献。收到此错误:

AttributeError: 'Pipeline' object has no attribute 'n_outputs_'

我尝试使用 treeinterpreter 的参数,但我卡住了。

因此我的问题是:当我们使用 sklearns Pipeline 时,有什么方法可以从树中获取贡献?

编辑 2 - Venkatachalam 要求的真实数据:

# Data DF to train model
df = pd.DataFrame(
  [['SGOHC', 'd',   'onetwothree',  'BAN',  488.0580347,    960 ,841,   82, 0.902497027,    841 ,0.548155625    ,0.001078211,   0.123958333 ,1],
   ['ABCDEFGHIJK',  'SOC'   ,'CON','CAN',   680.84, 1638,   0,  0,  0   ,0  ,3.011140743    ,0.007244358,   1   ,0],
   ['Hello',    'AA',   'onetwothree',  'SPEAKER',  5823.230967,    2633,   1494    ,338    ,0.773761714    ,1494,  12.70144386 ,0.005743015,   0.432586403,    8]], 
  columns=['B','C','D','E','F','G','H','I','J','K','L','M', 'N', 'target'])

# Create test and train set (useless, but for the example...) 
from sklearn.model_selection  import train_test_split

# Define X and y 
X = df.drop('target', axis=1)
y = df['target']

# Create Train and Test Sets 
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)


 # Make the pipeline and model 
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
import numpy as np
import pandas as pd
from sklearn import set_config
from sklearn.model_selection import ParameterGrid
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt

rfr = Pipeline([('preprocess',
                   ColumnTransformer([('ohe',
                                       OneHotEncoder(handle_unknown='ignore'), [1])])),
                  ('rf', RandomForestRegressor())])

rfr.fit(X_train, Y_train)


# The New, Real data that we need to predict & explain! 

new_data = pd.DataFrame(
  [['DEBTYIPL', 'de',   'onetwothreefour',  'BANAAN',   4848.0580347,   923460  ,823441,    5,  0.902497027,    43  ,0.548155625    ,0.001078211,   0.123958333 ],
   ['ABCDEFGHIJK',  'SOC'   ,'CON','CAN23', 680.84, 1638,   0,  0,  0   ,0  ,1.011140743    ,4.007244358,   1   ],
   ['Hello_NO', 'AAAAa',    'onetwothree',  'SPEAKER',  5823.230967,    123,    32  ,22 ,0.773761714    ,1678,  12.70144386 ,0.005743015,   0.432586403]], 
  columns=['B','C','D','E','F','G','H','I','J','K','L','M', 'N'])
new_data.head()

# Predicting the values 
rfr.predict(new_data)

# Now the error... the contributions: 
from treeinterpreter import treeinterpreter as ti
prediction, bias, contributions = ti.predict(rfr[-1], rfr[:-1].fit_transform(new_data))

#ValueError: Number of features of the model must match the input. Model n_features is 2 and input n_features is 3 

最佳答案

您可以通过索引管道对象 model[-1] 来获得最终的估算器。 类似地,我们通过 model[:-1] 获得一个新的管道(以捕获所有转换步骤)排除分类器。

因此,这就是您需要做的!

prediction, bias, contributions = ti.predict(model[-1], model[:-1].transform(df))

关于python - 如果我们使用管道,我怎样才能得到树解释器的树贡献?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65040249/

相关文章:

python - 从实用程序导入配置--ImportError : cannot import name 'Config'

python - 在 tensorflow 中与许多输入数据文件很好地混合

Python Pandas : classifying values in column and making a new column

python - 在Python中的多列上对numpy数组进行排序

python - IO错误 : [Errno 13] Permission denied - ftplib

python - 将每个元素映射到表达式

python - n 形状列表与 n 形状数组在一行中按元素相乘

Python scikit-learn - 类型错误

python - 如何在 KNN 中的 minkowski 度量中设置 p < 1?

Python - 将数据拆分为 n 个分层部分