在尝试将对数正态分布拟合到截断的数据时,我发现了以下两个 Stackoverflow 帖子并关注了它们:
Fitting a lognormal distribution to truncated data in R Fitting a truncated lognormal distribution in R
但是,该解决方案似乎不再有效,因为 truncdist 包中的 dtrunc 和 ptrunc 函数现在无法传递 test function of fitdistrplus.
dtruncated_log_normal <- function(x, a,b, meanlog, sdlog)
dtrunc(x, "lnorm", a=a, b=b, meanlog=meanlog, sdlog=sdlog)
ptruncated_log_normal <- function(q, a,b, meanlog, sdlog)
ptrunc(q, "lnorm", a=a, b=b, meanlog=meanlog, sdlog=sdlog)
fit <- fitdist(s, "truncated_log_normal", start=list(a=0.001, b=90, meanlog=mean(log(s)), sdlog=sd(log(s))))
我们基本上遇到了测试函数的所有错误,返回
Error in fitdist(s, "truncated_log_normal", start = list(a = 0.001, b = 90, :
the function mle failed to estimate the parameters,
with the error code 100
In addition: Warning messages:
1: In fitdist(s, "truncated_log_normal", start = list(a = 0.001, b = 90, :
The dtruncated_log_normal function should return a vector of with NaN values when input has inconsistent values and not raise an error
2: In fitdist(s, "truncated_log_normal", start = list(a = 0.001, b = 90, :
The ptruncated_log_normal function should return a vector of with NaN values when input has inconsistent parameters and not raise an error
包含超过 2000 个元素的向量示例:
> dput(head(s,150))
c(88.443, 89.296, 89.327, 87.776, 89.405, 89.824, 89.997, 87.678,
89.665, 88.814, 88.841, 89.728, 89.365, 89.476, 89.189, 88.251,
88.939, 89.945, 89.567, 89.613, 89.317, 89.622, 87.674, 89.19,
89.782, 89.891, 89.954, 89.556, 89.093, 89.637, 89.052, 87.395,
87.835, 89.357, 87.733, 89.459, 88.197, 88.539, 88.564, 87.857,
88.74, 88.955, 89.691, 88.102, 89.635, 89.116, 89.584, 88.288,
86.95, 89.182, 89.435, 88.93, 87.567, 89.083, 88.52, 88.897,
89.54, 88.557, 89.269, 89.854, 89.31, 88.274, 89.126, 89.431,
88.257, 88.872, 88.978, 89.03, 87.434, 88.305, 89.656, 87.556,
89.209, 89.508, 87.781, 88.068, 89.933, 87.256, 88.906, 89.067,
88.92, 87.947, 88.196, 88.951, 89.594, 88.378, 87.482, 88.817,
89.65, 89.392, 89.932, 87.896, 89.909, 89.265, 89.954, 89.827,
87.49, 87.786, 89.208, 89.728, 88.905, 87.566, 86.612, 88.363,
87.457, 87.639, 88.907, 88.425, 87.244, 88.546, 88.221, 89.293,
87.469, 87.31, 89.107, 88.442, 89.133, 88.812, 88.418, 89.456,
88.512, 89.514, 87.446, 88.374, 89.282, 87.415, 89.004, 87.627,
89.107, 89.168, 89.589, 89.288, 88.496, 89.807, 87.518, 88.796,
88.001, 87.322, 87.353, 88.055, 88.81, 88.456, 87.876, 87.7,
88.675, 88.996, 89.479, 86.781, 86.928, 87.356)
2023 年有解决办法吗?我的数据已经清理完毕,不包含不一致、空、NA 或任何其他奇怪的值。
最佳答案
第一个问题是参数的限制:0 < a < min(s) < max(s) < b
和0 < sdlog
是模型的要求。优化器不知道这一点,并且会遇到麻烦,因为违反这些限制的值会生成大量错误。
处理此类问题的一种方法是修改参数,使其不受限制。例如,您可以使用 log(a)
作为参数,因为它保证为正数,并使用 log(b-a)
作为另一个参数,以保证 b > a
和 log(sdlog)
作为另一个参数。
下一个问题更难。对于优化器尝试的某些值,原始分布落在截断区间内的概率评估为零。具体来说,我在调试 a = 0.001, b = 90, sdlog = 0.009529981
(您的起始值)和 meanlog = 5.176146
(比起始值稍大)时看到了这一点。
概率实际上并不是零,它是一个向下舍入为零的小值。解决方案是使用对数概率而不是概率,但我认为您无法使用该选项——出现此问题的代码位于 truncdist
中,但参数来自 fitdistrplus
。这两个包需要一起工作才能解决这个问题,或者也许你
可以编写专门版本的 dtrunc
和 ptrunc
来完成此操作。
编辑添加:
dtrunc
使用的基本思想是截断密度
等于常规密度除以处于截断区间的概率,即 d/(pb - pa)
其中
d
是全密度,pb
和 pa
是端点处的 CDF 值。数值问题是 pb == pa
由于四舍五入所致。
解决这个问题的方法是重新调整所有内容,并在对数尺度上进行工作。也就是说,使用 (d/pb)/(1 - pa/pb) = exp(log(d) - log(pb) - log1p(-exp(log(pa) - log(pb))))
。
下面的代码可以完成此操作并解决第一个问题。它不使用 truncdist
包,而是使用基本函数进行计算。
但这还不够!现在的问题是 fitdist
强制 optim
计算 Hessian 矩阵,而数值问题导致其失败。有一种解决方法:我将定义一个“自定义”优化函数,这只是常规的 optim()
,其中 hessian
参数强制为 FALSE
。
结果如下。抱歉,没有标准错误。
library(fitdistrplus)
#> Loading required package: MASS
#> Loading required package: survival
s <- c(88.443, 89.296, 89.327, 87.776, 89.405, 89.824, 89.997, 87.678,
89.665, 88.814, 88.841, 89.728, 89.365, 89.476, 89.189, 88.251,
88.939, 89.945, 89.567, 89.613, 89.317, 89.622, 87.674, 89.19,
89.782, 89.891, 89.954, 89.556, 89.093, 89.637, 89.052, 87.395,
87.835, 89.357, 87.733, 89.459, 88.197, 88.539, 88.564, 87.857,
88.74, 88.955, 89.691, 88.102, 89.635, 89.116, 89.584, 88.288,
86.95, 89.182, 89.435, 88.93, 87.567, 89.083, 88.52, 88.897,
89.54, 88.557, 89.269, 89.854, 89.31, 88.274, 89.126, 89.431,
88.257, 88.872, 88.978, 89.03, 87.434, 88.305, 89.656, 87.556,
89.209, 89.508, 87.781, 88.068, 89.933, 87.256, 88.906, 89.067,
88.92, 87.947, 88.196, 88.951, 89.594, 88.378, 87.482, 88.817,
89.65, 89.392, 89.932, 87.896, 89.909, 89.265, 89.954, 89.827,
87.49, 87.786, 89.208, 89.728, 88.905, 87.566, 86.612, 88.363,
87.457, 87.639, 88.907, 88.425, 87.244, 88.546, 88.221, 89.293,
87.469, 87.31, 89.107, 88.442, 89.133, 88.812, 88.418, 89.456,
88.512, 89.514, 87.446, 88.374, 89.282, 87.415, 89.004, 87.627,
89.107, 89.168, 89.589, 89.288, 88.496, 89.807, 87.518, 88.796,
88.001, 87.322, 87.353, 88.055, 88.81, 88.456, 87.876, 87.7,
88.675, 88.996, 89.479, 86.781, 86.928, 87.356)
dtruncated_log_normal <- function(x, loga, logbminusa, meanlog, logsdlog) {
a <- exp(loga)
b <- exp(logbminusa) + a
goodvals <- is.finite(x) & a <= x & x <= b
sdlog <- exp(logsdlog)
# cat("parms:", c(min(x[goodvals]), max(x[goodvals]), loga, a, logbminusa, b, meanlog, sdlog), "\n")
logpab <- plnorm(c(a,b), meanlog, sdlog, log = TRUE)
result <- rep(0, length(x))
logd <- dlnorm(x[goodvals], meanlog, sdlog, log = TRUE)
result[goodvals] <- exp(logd - logpab[2] - log1p(-exp(logpab[1]- logpab[2])))
result
}
ptruncated_log_normal <- function(q, loga, logbminusa, meanlog, logsdlog) rep(NaN, length(q))
myoptim <- function(..., hessian) {
optim(..., hessian = FALSE)
}
fit <- fitdist(s, "truncated_log_normal", start=list(loga=log(min(s) - 1), logbminusa=log(max(s) - min(s) + 2), meanlog=mean(log(s)), logsdlog=log(sd(log(s)))), custom.optim = myoptim)
fit
#> Fitting of the distribution ' truncated_log_normal ' by maximum likelihood
#> Parameters:
#> estimate Std. Error
#> loga 4.439900 NA
#> logbminusa 1.654508 NA
#> meanlog 4.490567 NA
#> logsdlog -4.354114 NA
# Convert back to the original scale:
ests <- as.list(fit$estimate)
with(ests, {
a <- exp(loga)
b <- exp(logbminusa) + a
sdlog <- exp(logsdlog)
c(a = a, b = b, meanlog = meanlog, sdlog = sdlog)
})
#> a b meanlog sdlog
#> 84.76649227 89.99700006 4.49056699 0.01285382
创建于 2023 年 3 月 19 日,使用 reprex v2.0.2
关于r - 使用 fitdistrplus 拟合截断对数正态分布,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75781356/