r - 后验概率的校准

标签 r algorithm statistics probability calibration

目前我从事概率校准工作。我使用称为 rescaling algorithm 的校准方法 - 来源 http://lem.cnrs.fr/Portals/2/actus/DP_201106.pdf (第 7 页)。

我写的算法是:

rescaling_fun = function(x, y, z) {

    P_korg  = z # yhat_test_prob$BAD

    P_k_C1  = sum(as.numeric(y) - 1)/length(y) # testset$BAD
    P_kt_C1 = sum(as.numeric(x) - 1)/length(x) # trainset$BAD
    P_k_C0  = sum(abs(as.numeric(y) - 2))/length(y)
    P_kt_C0 = sum(abs(as.numeric(x) - 2))/length(x)

    P_new <- ((P_k_C1/P_kt_C1) * P_korg)/((P_k_C0/P_k_C0) * (1 - P_korg) + (P_k_C0/P_k_C1) * (P_korg))

  return(P_new)
}

输入值是:

1. x - train_set$BAD (actuals of `train set`)
2. y - test_set$BAD (actuals of `test set`)
3. z - yhat_test_prob$BAD (prediction on `test set`)

问题 - 结果值不在 0 和 1 范围内。你能帮忙解决这个问题吗？

最佳答案

您获取概率的公式 (P_k_C1 ...) 需要修改。例如，根据论文，y 是一个二进制变量 (0, 1)，公式为 sum(y - 1)/length(y) 这最有可能是负数 - 它转换y 值为 -1 或 0，然后将它们相加。我认为它应该是 (sum(y)-1)/length(y)。下面是一个例子。

set.seed(1237)
y <- sample(0:1, 10, replace = T)
y
[1] 0 1 0 0 0 1 1 0 1 1
# it must be negative as it is sum(y - 1) - y is 0 or 1
sum(as.numeric(y) - 1)/length(y)
[1] -0.5
# modification 
(sum(as.numeric(y)) - 1)/length(y)
[1] 0.4

关于r - 后验概率的校准，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29948919/

上一篇：c++ - 包含引用另一个 vector 内容的 vector 的对象

下一篇：algorithm - 寻找回溯递归算法的复杂性

相关文章：

r - 将下一行总结到新列中

c - 查找数组中最大幅度元素的 MSB 集

r - 支持向量机的训练和测试

python-2.7 - Python(Scipy): Finding the scale parameter (standard deviation) of a gaussian distribution

r - 在没有重叠点的情况下绘制 spatstat 中的残差

r - 在运行时更改 R 函数的默认参数

r - 插入符号神经网络错误 : "missing values in resampled performance measures"

python - 递归算法 : find position after n moves

arrays - 查找数组中第 N 个最频繁数字的算法

python - Python 中独立性的卡方检验