R:如何计算 rpart 树的敏感性和特异性

标签 r statistics regression

library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
                    RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
                    Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
                    Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
                                      levels=c("very inactive", "inactive", "active", "very active"),
                                      ordered=TRUE),
                    Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
prp(mytree, type = 4, extra = 101, leaf.round = 0, fallen.leaves = TRUE, 
    varlen = 0, tweak = 1.2)

enter image description here

然后通过使用 printcp我可以看到交叉验证结果
> printcp(mytree)

Classification tree:
rpart(formula = Fraud ~ RearEnd + Whiplash + Activity, data = train, 
    method = "class", minsplit = 2, minbucket = 1, cp = -1)

Variables actually used in tree construction:
[1] Activity RearEnd  Whiplash

Root node error: 5/10 = 0.5

n= 10 

    CP nsplit rel error xerror xstd
1  0.6      0       1.0    2.0  0.0
2  0.2      1       0.4    0.4  0.3
3 -1.0      3       0.0    0.4  0.3

所以根节点错误是 0.5,据我所知,这是错误分类错误。但是我在计算灵敏度(真阳性的比例)和特异性(真阴性的比例)时遇到了麻烦。我如何根据 rpart 计算这些值输出?

(上面的例子来自 http://gormanalysis.com/decision-trees-in-r-using-rpart/ )

最佳答案

您可以使用 caret包这样做:

数据:

library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
                    RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
                    Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
                    Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
                                      levels=c("very inactive", "inactive", "active", "very active"),
                                      ordered=TRUE),
                    Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)

解决方案
library(caret)

#calculate predictions
preds <- predict(mytree, train)

#calculate sensitivity
> sensitivity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1

#calculate specificity
> specificity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1

两者 sensitivityspecificity将预测作为第一个参数,将观察到的值(响应变量,即 train$Fraud)作为第二个参数。

根据文档,预测值和观测值都需要作为具有相同水平的因子提供给函数。

在这种情况下,特异性和敏感性都是 1,因为预测是 100% 准确的。

关于R:如何计算 rpart 树的敏感性和特异性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31094473/

相关文章:

algorithm - 关联词接近度

python-3.x - 用不同的随机统一变量替换 Pandas DataFrame 中的 NaN 值

3D 多项式回归

r - 如何从数据框中提取要在循环中使用的列的名称?

R strsplit 使用正则表达式

java - 计算多项分布的卷积

python - 如何在Python中创建一个易于解释的具有分类特征的回归模型?

r - 如何在 R 中使用 GenSA 函数进行数学约束

Excel给出了奇怪的R平方计算?

python - 具有分类数据的随机森林仅预测某一类别内的数据