scikit-learn - 决策树中的剪枝和提升

标签 scikit-learn

如何在基于决策树的分类方法中使用修剪和提升?

I have 10 features and 3000 samples.

最佳答案

这是一个演示如何使用 Boosting 的示例。

from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.metrics import classification_report

# generate some artificial data
X, y = make_classification(n_samples=3000, n_features=10, n_informative=2, flip_y=0.1, weights=[0.15, 0.85], random_state=0)

# train/test split
split = StratifiedShuffleSplit(y, n_iter=1, test_size=0.2, random_state=0)
train_index, test_index = list(split)[0]
X_train, y_train = X[train_index], y[train_index]
X_test, y_test = X[test_index], y[test_index]

# boosting: many many weak classifiers (max_depth=1) refine themselves sequentially
# tree is the default the base classifier
estimator = GradientBoostingClassifier(n_estimators=200, learning_rate=0.1, max_depth=1, random_state=0)
estimator.fit(X_train, y_train)
y_pred = estimator.predict(X_test)
print(classification_report(y_test, y_pred))

             precision    recall  f1-score   support

          0       0.88      0.80      0.84       109
          1       0.96      0.98      0.97       491

avg / total       0.94      0.94      0.94       600

# benchmark: a standard tree
tree_benchmark = DecisionTreeClassifier(max_depth=3, class_weight='auto')
tree_benchmark.fit(X_train, y_train)
y_pred_benchmark = tree_benchmark.predict(X_test)
print(classification_report(y_test, y_pred_benchmark))

             precision    recall  f1-score   support

          0       0.63      0.84      0.72       109
          1       0.96      0.89      0.92       491

avg / total       0.90      0.88      0.89       600

关于scikit-learn - 决策树中的剪枝和提升,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31231499/

相关文章:

python - Matplotlib 散点图 - ValueError : RGBA sequence should have length 3 or 4

machine-learning - 线性回归 : Need to clarify the Coef*Feature meaning

python - 如何在 sklearn 中实现前向测试?

python - PCA:结果矩阵 n-1 行

Scikit-Learn:使用 DBSCAN 预测新点

python - 如何从数据帧在 keras flow 中提供一个热编码矢量数据帧

python - Scikit 使用 SVC 学习错误预测

python - 在不使用 "internal"API 的情况下获取 scikit-learn PCA 上的组件

python - sklearn.feature_extraction.text 中距离度量的选择 - 特征工程

python - scikit learn PCA 降维 - 数据大量特征和少量样本