python - 构建决策树

标签 python machine-learning decision-tree

构建决策树时，在每个节点，我们选择最佳特征，然后选择该特征的最佳分割位置。但是，当当前节点/集合中的样本的最佳特征的所有值为 0 时，我该怎么办？所有样本继续分组到一侧(<= 0 分支)，并发生无限循环。例如:

#left: 1500, #right: 0

那么，

#left: 1500, #right: 0

等等...

仅供引用，我遵循以下伪代码。

GrowTree(S)
if (y_i = C for all i in S and some class C) then {
 return new leaf(C)                             
 } else {
 choose best splitting feature j and splitting point beta (*)
 I choose the one that gives me the max entropy drop
 S_l = {i : X_ij < beta}                           
 S_r = {i : X_ij >= beta}
 return new node(j, beta, GrowTree(S_l), GrowTree(S_r))

}

最佳答案

这根本不可能。您应该选择能够最大程度提高模型确定性的阈值。使用将每个实例放在同一分支中的阈值可以使模型确定性增加 0，因此这不是最佳分割。当且仅当此特征中的杂质/熵已经为 0 时，才会发生这种情况，但它是在决策树中创建叶子的停止标准。

关于python - 构建决策树，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36457849/

上一篇：machine-learning - 如何解释训练/验证学习曲线的结果？

下一篇：python - 使用 Pandas DataFrame 列的 Sklearn GridSearchCV

相关文章：

python - 不使用 graphviz/web 可视化决策树

python - 查找给定节点的最高权重边

r - 在 R 中增强多类分类树

python - LSTM 模型在第一个 epoch 后的 val_acc 为 1.0？

matlab - 如何使用 MATLAB 从 WEKA 中检索类值

algorithm - 帮助理解交叉验证和决策树

python - Virtualenv pip ssl - 尝试安装任何软件包时失败。错误: "ssl module in Python is not available"

python - 当我尝试运行 "import matplotlib.pyplot as plt"时，收到以下错误 : "ModuleNotFoundError: No module named ' PIL'"

python - 在numpy中用3d数组索引2d数组

machine-learning - 根据相似度对图像进行聚类