python - XGBoost 使用 sklearn API 获取 predict_contrib？

标签 python scikit-learn xgboost

在 Python 中，XGBoost 允许您使用他们的 Booster 类或使用他们的 sklearn API (http://xgboost.readthedocs.io/en/latest/python/python_api.html) 来训练/预测。我正在使用 sklearn API，并希望使用 XGBoost 的 pred_contribs 功能。我希望这会起作用，但它不起作用:

model = xgb.XGBClassifier().fit(X_train, y_train)
pred = model.predict_proba(X_test, pred_contribs=True)

看起来 pred_contribs 只是 Booster 类预测函数的一个参数。如何通过 sklearn API 使用此参数？或者在使用 sklearn API 训练后是否有一种简单的解决方法来获取预测贡献者？

最佳答案

您可以使用 XGBClassifier 中的 get_booster() 方法，在 XGBClassifier 与训练数据相匹配后，该方法将返回一个 Booster 对象。

之后，您可以使用 pred_contribs = True 在 Booster 对象上简单地调用 predict()。

示例代码:

from xgboost import XGBClassifier, DMatrix
from sklearn.datasets import load_iris

iris_data = load_iris()

# Taking only first 100 samples to make this a binary problem, 
# else it will be multi-class and shape of pred_contribs will change
X, y = iris_data.data[:100], iris_data.target[:100]

# This data has 4 features
print(X.shape)
Output: (100, 4)


clf = XGBClassifier()
clf.fit(X, y)

# This is what you need
booster = clf.get_booster()


# Using only a single sample for predict, you can use multiple
test_X = [X[0]]

# Wrapping the test X into a DMatrix, need by Booster
predictions = booster.predict(DMatrix(test_X), pred_contribs=True)

print(predictions.shape)

# Output has 5 columns, 1 for each feature, and last for bias
Output: (1, 5)

关于python - XGBoost 使用 sklearn API 获取 predict_contrib？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49697514/

上一篇：python - 如何将值是列表的系列的值折叠成唯一列表

下一篇：Python Pandas 更新数据框列类型

相关文章：

Python:从 cygwin 运行有效，而从 PyCharm 运行无效

python - 如何将字符串转换为 float ？值错误: could not convert string to float: '0,25691372'

python - 如何在 MacOS 上的 python 中安装 xgboost？

python - 利用 Python Input() 对数据集执行 DateShift

python - Boto3 配置文件创建 : botocore. 异常。ProfileNotFound:找不到配置配置文件

machine-learning - 使用 Scorer 对象进行分类器评分方法

python - Lasso正则化器sklearn中的max_iter和tol是什么

python - XGBoost 包中的特征分数(/重要性)是如何计算的？

python - 分组并返回所有列

python - Pandas 添加一列，其值可以是多个不同的值