R 包中找不到可用的方差函数
从 gam-object
进行预测时使用 predict
函数
之前构建的(mgcv 1.8-31)。
R 包应(除其他外)根据 gam 对象进行预测。全部
先前构建的模型使用高斯系列并有其
自己的方差函数。一些方差函数只是线性效应
变量,其他人使用更复杂的自定义函数。这
模型和方差函数存储在“sysdata.rda”文件中
将它们包含在包中。该包已使用 devtools
进行了记录
和roxygen2
。
考虑以下两个 GAM 的最小示例:
library(mgcv)
set.seed(123)
var.fun <- function(x){x^2}
x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))
mod.gam.1 <- gam(formula = list(y ~ x,
~ var.fun(x)),
family = gaulss(link = list("log", "logb")))
mod.gam.2 <- gam(formula = list(y ~ x,
~ I(x^2)),
family = gaulss(link = list("log", "logb")))
第一个模型使用自定义方差函数。第二个模型有 模型调用中硬编码的方差公式。
模型和方差函数随后存储在“sysdata.rda”中 文件,它包含在名为“gamvarfun”的包中(我知道,命名 东西...),如下:
save(mod.gam.1, var.fun, mod.gam.2,
file = "~/gamvarfun/R/sysdata.rda")
现在两个函数被添加到包中以检索预测 对应型号:
pred.fun.1 <- function(x){
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
pred.fun.2 <- function(x){
predict(mod.gam.2,
newdata = data.frame("x" = x))
}
为了说明问题,我上传了一个非常基本的演示包 到github,所以下面的代码应该可以工作:
library(devtools)
install_github("jan-schick/gam_issue")
library(gamvarfun)
pred.fun.2(1)
# 0.3135275 0.5443409
pred.fun.1(1)
# Error in var.fun(x) : could not find function "var.fun"
# Writing var.fun to global environment
var.fun <- gamvarfun:::var.fun
pred.fun.1(1)
# 0.3135275 0.5443409
使用 pred.fun.1
(包含自定义方差函数)时,
显示错误。但是,pred.fun.2
(硬编码方差函数)
工作得很好。当
使用 devtools::load_all()
而不是“正确”安装
包裹。我怀疑是环境不同造成的问题
使用 predict.gam
时。我通过编写自定义测试了这个假设
在调用 pred.fun.1 之前对全局环境的方差函数
(见上文),有效。然而,这显然不是解决问题的办法
包。
在早期的尝试中,我尝试在包内声明该函数, 例如将上面编写的代码直接放在预测中 功能。安装包后这也不起作用。
pred.fun.1 <- function(x){
var.fun <- function(x){x^2}
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
我还尝试过在同一位置(在 预测函数)没有任何成功。
我发现的唯一可行的解决方案是导出方差函数 并使其成为包命名空间/API 的一部分。这是不可行的 在这种情况下的解决方案,因为它会导致许多可见的差异 功能,对用户没有实际好处。
然后有一个明显的解决方法:替换方差函数
通过模型调用中的原始公式,即使用 mod.gam.2
而不是 mod.gam.1
。然而,这不是一个正确的解决方案
要么。
咨询了各种搜索引擎和同事均无果。
因此,如果有任何关于如何解决这个问题的提示,我将不胜感激 问题
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gamvarfun_0.1.0 usethis_1.5.0 devtools_2.0.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 splines_3.6.2
[5] pkgload_1.0.2 lattice_0.20-38 R6_2.4.0 rlang_0.4.0
[9] tools_3.6.2 grid_3.6.2 pkgbuild_1.0.3 nlme_3.1-143
[13] mgcv_1.8-31 sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2
[17] remotes_2.0.4 assertthat_0.2.1 digest_0.6.20 rprojroot_1.3-2
[21] crayon_1.3.4 Matrix_1.2-18 processx_3.3.1 callr_3.2.0
[25] fs_1.3.1 ps_1.3.0 curl_4.0 testthat_2.1.1
[29] memoise_1.1.0 glue_1.3.1 compiler_3.6.2 desc_1.2.0
[33] backports_1.1.4 prettyunits_1.0.2
最佳答案
问题在于公式具有与其关联的环境,并且您正在保存与 globalenv()
的关联,但将函数放在其他地方。
如果您坚持使用 sysdata.rda
,这个问题并不容易解决,因为 mod.gam.1
对象将该环境复制到多个位置。仅修补 environment(mod.gam.1$formula)
是不够的。
我认为唯一可行的方法是将生成 mod.gam.1
对象的源代码包含在包源代码中。如果公式是在从包命名空间继承的上下文中定义的,那么将找到那里定义的方差函数。
编辑添加:我尝试了此策略,但无法使其发挥作用。看起来好像 mgcv
包中存在错误,特别是在解释 gualss
模型使用的特殊对偶公式的代码中。我会看看是否能找到解决方法。
第二次编辑:运行下面的代码似乎可以修复 mgcv
中存在错误的函数。它们基于 mgcv
版本 1.8-31。我不建议运行此脚本,除非您完全安装了该版本。我已经向软件包维护者发送了一条消息;也许这些将被合并到 future 的版本中。
interpret.gam0 <- function(gf,textra=NULL,extra.special=NULL)
# interprets a gam formula of the generic form:
# y~x0+x1+x3*x4 + s(x5)+ s(x6,x7) ....
# and returns:
# 1. a model formula for the parametric part: pf (and pfok indicating whether it has terms)
# 2. a list of descriptors for the smooths: smooth.spec
# this is function does the work, and is called by in interpret.gam
# 'textra' is optional text to add to term labels
# 'extra.special' is label of extra smooth within formula.
{ p.env <- environment(gf) # environment of formula
tf <- terms.formula(gf,specials=c("s","te","ti","t2",extra.special)) # specials attribute indicates which terms are smooth
terms <- attr(tf,"term.labels") # labels of the model terms
nt <- length(terms) # how many terms?
if (attr(tf,"response") > 0) { # start the replacement formulae
response <- as.character(attr(tf,"variables")[2])
} else {
response <- NULL
}
sp <- attr(tf,"specials")$s # array of indices of smooth terms
tp <- attr(tf,"specials")$te # indices of tensor product terms
tip <- attr(tf,"specials")$ti # indices of tensor product pure interaction terms
t2p <- attr(tf,"specials")$t2 # indices of type 2 tensor product terms
zp <- if (is.null(extra.special)) NULL else attr(tf,"specials")[[extra.special]]
off <- attr(tf,"offset") # location of offset in formula
## have to translate sp, tp, tip, t2p (zp) so that they relate to terms,
## rather than elements of the formula...
vtab <- attr(tf,"factors") # cross tabulation of vars to terms
if (length(sp)>0) for (i in 1:length(sp)) {
ind <- (1:nt)[as.logical(vtab[sp[i],])]
sp[i] <- ind # the term that smooth relates to
}
if (length(tp)>0) for (i in 1:length(tp)) {
ind <- (1:nt)[as.logical(vtab[tp[i],])]
tp[i] <- ind # the term that smooth relates to
}
if (length(tip)>0) for (i in 1:length(tip)) {
ind <- (1:nt)[as.logical(vtab[tip[i],])]
tip[i] <- ind # the term that smooth relates to
}
if (length(t2p)>0) for (i in 1:length(t2p)) {
ind <- (1:nt)[as.logical(vtab[t2p[i],])]
t2p[i] <- ind # the term that smooth relates to
}
if (length(zp)>0) for (i in 1:length(zp)) {
ind <- (1:nt)[as.logical(vtab[zp[i],])]
zp[i] <- ind # the term that smooth relates to
} ## re-referencing is complete
k <- kt <- kti <- kt2 <- ks <- kz <- kp <- 1 # counters for terms in the 2 formulae
len.sp <- length(sp)
len.tp <- length(tp)
len.tip <- length(tip)
len.t2p <- length(t2p)
len.zp <- length(zp)
ns <- len.sp + len.tp + len.tip + len.t2p + len.zp# number of smooths
pav <- av <- rep("",0)
smooth.spec <- list()
#mgcvat <- "package:mgcv" %in% search() ## is mgcv in search path?
mgcvns <- loadNamespace('mgcv')
if (nt) for (i in 1:nt) { # work through all terms
if (k <= ns&&((ks<=len.sp&&sp[ks]==i)||(kt<=len.tp&&tp[kt]==i)||(kz<=len.zp&&zp[kz]==i)||
(kti<=len.tip&&tip[kti]==i)||(kt2<=len.t2p&&t2p[kt2]==i))) { # it's a smooth
## have to evaluate in the environment of the formula or you can't find variables
## supplied as smooth arguments, e.g. k <- 5;gam(y~s(x,k=k)), fails,
## but if you don't specify namespace of mgcv then stuff like
## loadNamespace('mgcv'); k <- 10; mgcv::interpret.gam(y~s(x,k=k)) fails (can't find s)
## eval(parse(text=terms[i]),envir=p.env,enclos=loadNamespace('mgcv')) fails??
## following may supply namespace of mgcv explicitly if not on search path...
## If 's' etc are masked then we can fail even if mgcv on search path, hence paste
## of explicit mgcv reference into first attempt...
st <- try(eval(parse(text=paste("mgcv::",terms[i],sep="")),envir=p.env),silent=TRUE)
if (inherits(st,"try-error")) st <-
eval(parse(text=terms[i]),enclos=p.env,envir=mgcvns)
if (!is.null(textra)) { ## modify the labels on smooths with textra
pos <- regexpr("(",st$lab,fixed=TRUE)[1]
st$label <- paste(substr(st$label,start=1,stop=pos-1),textra,
substr(st$label,start=pos,stop=nchar(st$label)),sep="")
}
smooth.spec[[k]] <- st
if (ks<=len.sp&&sp[ks]==i) ks <- ks + 1 else # counts s() terms
if (kt<=len.tp&&tp[kt]==i) kt <- kt + 1 else # counts te() terms
if (kti<=len.tip&&tip[kti]==i) kti <- kti + 1 else # counts ti() terms
if (kt2<=len.t2p&&t2p[kt2]==i) kt2 <- kt2 + 1 # counts t2() terms
else kz <- kz + 1
k <- k + 1 # counts smooth terms
} else { # parametric
av[kp] <- terms[i] ## element kp on rhs of parametric
kp <- kp+1 # counts parametric terms
}
}
if (!is.null(off)) { ## deal with offset
av[kp] <- as.character(attr(tf,"variables")[1+off])
kp <- kp+1
}
pf <- paste(response,"~",paste(av,collapse=" + "))
if (attr(tf,"intercept")==0) {
pf <- paste(pf,"-1",sep="")
if (kp>1) pfok <- 1 else pfok <- 0
} else {
pfok <- 1;if (kp==1) {
pf <- paste(pf,"1");
}
}
fake.formula <- pf
if (length(smooth.spec)>0)
for (i in 1:length(smooth.spec)) {
nt <- length(smooth.spec[[i]]$term)
ff1 <- paste(smooth.spec[[i]]$term[1:nt],collapse="+")
fake.formula <- paste(fake.formula,"+",ff1)
if (smooth.spec[[i]]$by!="NA") {
fake.formula <- paste(fake.formula,"+",smooth.spec[[i]]$by)
av <- c(av,smooth.spec[[i]]$term,smooth.spec[[i]]$by)
} else av <- c(av,smooth.spec[[i]]$term)
}
fake.formula <- as.formula(fake.formula,p.env)
if (length(av)) {
pred.formula <- as.formula(paste("~",paste(av,collapse="+")))
pav <- all.vars(pred.formula) ## trick to strip out 'offset(x)' etc...
pred.formula <- reformulate(pav)
environment(pred.formula) <- environment(gf)
} else pred.formula <- ~1
ret <- list(pf=as.formula(pf,p.env),pfok=pfok,smooth.spec=smooth.spec,
fake.formula=fake.formula,response=response,fake.names=av,
pred.names=pav,pred.formula=pred.formula)
class(ret) <- "split.gam.formula"
ret
} ## interpret.gam0
interpret.gam <- function(gf,extra.special=NULL) {
## wrapper to allow gf to be a list of formulae or
## a single formula. This facilitates general penalized
## likelihood models in which several linear predictors
## may be involved...
##
## The list syntax is as follows. The first formula must have a response on
## the lhs, rather than labels. For m linear predictors, there
## must be m 'base formulae' in linear predictor order. lhs labels will
## be ignored in a base formula. Empty base formulae have '-1' on rhs.
## Further formulae have labels up to m labels 1,...,m on the lhs, in a
## syntax like this: 3 + 5 ~ s(x), which indicates that the same s(x)
## should be added to both linear predictors 3 and 5.
## e.g. A bivariate normal model with common expected values might be
## list(y1~-1,y2~-1,1+2~s(x)), whereas if the second component was contaminated
## by something else we might have list(y1~-1,y2~s(v)-1,1+2~s(x))
##
## For a list argument, this routine returns a list of split.formula objects
## with an extra field "lpi" indicating the linear predictors to which each
## contributes...
if (is.list(gf)) {
d <- length(gf)
## make sure all formulae have a response, to avoid
## problems with parametric sub formulae of the form ~1
#if (length(gf[[1]])<3) stop("first formula must specify a response")
resp <- gf[[1]][2]
ret <- list()
pav <- av <- rep("",0)
nlp <- 0 ## count number of linear predictors (may be different from number of formulae)
for (i in 1:d) {
textra <- if (i==1) NULL else paste(".",i-1,sep="") ## modify smooth labels to identify to predictor
lpi <- getNumericResponse(gf[[i]]) ## get linear predictors to which this applies, if explicit
if (length(lpi)==1) warning("single linear predictor indices are ignored")
if (length(lpi)>0) gf[[i]][[2]] <- NULL else { ## delete l.p. labels from formula response
nlp <- nlp + 1;lpi <- nlp ## this is base formula for l.p. number nlp
}
ret[[i]] <- interpret.gam0(gf[[i]],textra,extra.special=extra.special)
ret[[i]]$lpi <- lpi ## record of the linear predictors to which this applies
## make sure all parametric formulae have a response, to avoid
## problems with parametric sub formulae of the form ~1
respi <- rep("",0) ## no extra response terms
if (length(ret[[i]]$pf)==2) {
ret[[i]]$pf[3] <- ret[[i]]$pf[2];ret[[i]]$pf[2] <- resp
respi <- rep("",0)
} else if (i>1) respi <- ret[[i]]$response ## extra response terms
av <- c(av,ret[[i]]$fake.names,respi) ## accumulate all required variable names
pav <- c(pav,ret[[i]]$pred.names) ## predictors only
}
av <- unique(av) ## strip out duplicate variable names
pav <- unique(pav)
if (length(av)>0) {
## work around - reformulate with response = "log(x)" will treat log(x) as a name,
## not the call it should be...
fff <- formula(paste(ret[[1]]$response,"~ ."))
ret$fake.formula <- reformulate(av,response=ret[[1]]$response)
environment(ret$fake.formula) <- environment(gf[[1]])
ret$fake.formula[[2]] <- fff[[2]] ## fix messed up response
} else ret$fake.formula <- ret[[1]]$fake.formula ## create fake formula containing all variables
ret$pred.formula <- if (length(pav)>0) reformulate(pav) else ~1 ## predictor only formula
environment(ret$pred.formula) <- environment(gf[[1]])
ret$response <- ret[[1]]$response
ret$nlp <- nlp ## number of linear predictors
for (i in 1:d) if (max(ret[[i]]$lpi)>nlp||min(ret[[i]]$lpi)<1) stop("linear predictor labels out of range")
class(ret) <- "split.gam.formula"
return(ret)
} else interpret.gam0(gf,extra.special=extra.special)
} ## interpret.gam
## Now some test code.
environment(interpret.gam) <- environment(mgcv::interpret.gam)
environment(interpret.gam0) <- environment(mgcv:::interpret.gam0)
assignInNamespace("interpret.gam", interpret.gam, "mgcv")
assignInNamespace("interpret.gam0", interpret.gam0, "mgcv")
set.seed(123)
mod.gam.1 <- local({
var.fun <- function(x){x^2}
x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))
gam(formula = list(y ~ x,
~ var.fun(x)),
family = gaulss(link = list("log", "logb")))
})
pred.fun.1 <- function(x){
predict(mod.gam.1,
newdata = data.frame("x" = x))
}
pred.fun.1(1)
关于r - 使用包内的自定义方差函数从 gaulss-gams 进行预测时出现环境问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59615362/