r - 使用包内的自定义方差函数从 gaulss-gams 进行预测时出现环境问题

R 包中找不到可用的方差函数从 gam-object 进行预测时使用 predict 函数之前构建的(mgcv 1.8-31)。

R 包应(除其他外)根据 gam 对象进行预测。全部先前构建的模型使用高斯系列并有其自己的方差函数。一些方差函数只是线性效应变量，其他人使用更复杂的自定义函数。这模型和方差函数存储在“sysdata.rda”文件中将它们包含在包中。该包已使用 devtools 进行了记录和roxygen2。

考虑以下两个 GAM 的最小示例:

library(mgcv)

set.seed(123)

var.fun <- function(x){x^2}

x <- runif(100)
y <- x + rnorm(100, 0, var.fun(x))

mod.gam.1 <- gam(formula = list(y ~ x,
                                ~ var.fun(x)),
                 family = gaulss(link = list("log", "logb")))

mod.gam.2 <- gam(formula = list(y ~ x,
                                ~ I(x^2)),
                 family = gaulss(link = list("log", "logb")))

第一个模型使用自定义方差函数。第二个模型有模型调用中硬编码的方差公式。

模型和方差函数随后存储在“sysdata.rda”中文件，它包含在名为“gamvarfun”的包中(我知道，命名东西...)，如下:

save(mod.gam.1, var.fun, mod.gam.2,
     file = "~/gamvarfun/R/sysdata.rda")

现在两个函数被添加到包中以检索预测对应型号:

pred.fun.1 <- function(x){
  predict(mod.gam.1,
          newdata = data.frame("x" = x))
}

pred.fun.2 <- function(x){
  predict(mod.gam.2,
          newdata = data.frame("x" = x))
}

为了说明问题，我上传了一个非常基本的演示包到github，所以下面的代码应该可以工作:

library(devtools)
install_github("jan-schick/gam_issue")
library(gamvarfun)

pred.fun.2(1)
# 0.3135275 0.5443409

pred.fun.1(1)
# Error in var.fun(x) : could not find function "var.fun"

# Writing var.fun to global environment
var.fun <- gamvarfun:::var.fun
pred.fun.1(1)
# 0.3135275 0.5443409

使用 pred.fun.1(包含自定义方差函数)时，显示错误。但是，pred.fun.2(硬编码方差函数) 工作得很好。当使用 devtools::load_all() 而不是“正确”安装包裹。我怀疑是环境不同造成的问题使用 predict.gam 时。我通过编写自定义测试了这个假设在调用 pred.fun.1 之前对全局环境的方差函数 (见上文)，有效。然而，这显然不是解决问题的办法包。

在早期的尝试中，我尝试在包内声明该函数，例如将上面编写的代码直接放在预测中功能。安装包后这也不起作用。

pred.fun.1 <- function(x){
  var.fun <- function(x){x^2}
  predict(mod.gam.1,
          newdata = data.frame("x" = x))
}

我还尝试过在同一位置(在预测函数)没有任何成功。

我发现的唯一可行的解决方案是导出方差函数并使其成为包命名空间/API 的一部分。这是不可行的在这种情况下的解决方案，因为它会导致许多可见的差异功能，对用户没有实际好处。

然后有一个明显的解决方法:替换方差函数通过模型调用中的原始公式，即使用 mod.gam.2 而不是 mod.gam.1。然而，这不是一个正确的解决方案要么。

咨询了各种搜索引擎和同事均无果。

因此，如果有任何关于如何解决这个问题的提示，我将不胜感激问题

R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] gamvarfun_0.1.0 usethis_1.5.0   devtools_2.0.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2        rstudioapi_0.10   magrittr_1.5      splines_3.6.2
 [5] pkgload_1.0.2     lattice_0.20-38   R6_2.4.0          rlang_0.4.0
 [9] tools_3.6.2       grid_3.6.2        pkgbuild_1.0.3    nlme_3.1-143
[13] mgcv_1.8-31       sessioninfo_1.1.1 cli_1.1.0         withr_2.1.2
[17] remotes_2.0.4     assertthat_0.2.1  digest_0.6.20     rprojroot_1.3-2
[21] crayon_1.3.4      Matrix_1.2-18     processx_3.3.1    callr_3.2.0
[25] fs_1.3.1          ps_1.3.0          curl_4.0          testthat_2.1.1
[29] memoise_1.1.0     glue_1.3.1        compiler_3.6.2    desc_1.2.0
[33] backports_1.1.4   prettyunits_1.0.2

最佳答案

问题在于公式具有与其关联的环境，并且您正在保存与 globalenv() 的关联，但将函数放在其他地方。

如果您坚持使用 sysdata.rda，这个问题并不容易解决，因为 mod.gam.1 对象将该环境复制到多个位置。仅修补 environment(mod.gam.1$formula) 是不够的。

我认为唯一可行的方法是将生成 mod.gam.1 对象的源代码包含在包源代码中。如果公式是在从包命名空间继承的上下文中定义的，那么将找到那里定义的方差函数。

编辑添加:我尝试了此策略，但无法使其发挥作用。看起来好像 mgcv 包中存在错误，特别是在解释 gualss 模型使用的特殊对偶公式的代码中。我会看看是否能找到解决方法。

第二次编辑:运行下面的代码似乎可以修复 mgcv 中存在错误的函数。它们基于 mgcv 版本 1.8-31。我不建议运行此脚本，除非您完全安装了该版本。我已经向软件包维护者发送了一条消息；也许这些将被合并到 future 的版本中。

interpret.gam0 <- function(gf,textra=NULL,extra.special=NULL)
# interprets a gam formula of the generic form:
#   y~x0+x1+x3*x4 + s(x5)+ s(x6,x7) ....
# and returns:
# 1. a model formula for the parametric part: pf (and pfok indicating whether it has terms)
# 2. a list of descriptors for the smooths: smooth.spec
# this is function does the work, and is called by in interpret.gam
# 'textra' is optional text to add to term labels
# 'extra.special' is label of extra smooth within formula.
{ p.env <- environment(gf) # environment of formula
  tf <- terms.formula(gf,specials=c("s","te","ti","t2",extra.special)) # specials attribute indicates which terms are smooth

  terms <- attr(tf,"term.labels") # labels of the model terms 
  nt <- length(terms) # how many terms?

  if (attr(tf,"response") > 0) {  # start the replacement formulae
    response <- as.character(attr(tf,"variables")[2])
  } else { 
    response <- NULL
  }
  sp <- attr(tf,"specials")$s     # array of indices of smooth terms 
  tp <- attr(tf,"specials")$te    # indices of tensor product terms
  tip <- attr(tf,"specials")$ti   # indices of tensor product pure interaction terms
  t2p <- attr(tf,"specials")$t2   # indices of type 2 tensor product terms
  zp <- if (is.null(extra.special)) NULL else attr(tf,"specials")[[extra.special]]
  off <- attr(tf,"offset") # location of offset in formula

  ## have to translate sp, tp, tip, t2p (zp) so that they relate to terms,
  ## rather than elements of the formula...
  vtab <- attr(tf,"factors") # cross tabulation of vars to terms
  if (length(sp)>0) for (i in 1:length(sp)) {
    ind <- (1:nt)[as.logical(vtab[sp[i],])]
    sp[i] <- ind # the term that smooth relates to
  }
  if (length(tp)>0) for (i in 1:length(tp)) {
    ind <- (1:nt)[as.logical(vtab[tp[i],])]
    tp[i] <- ind # the term that smooth relates to
  } 
  if (length(tip)>0) for (i in 1:length(tip)) {
    ind <- (1:nt)[as.logical(vtab[tip[i],])]
    tip[i] <- ind # the term that smooth relates to
  } 
  if (length(t2p)>0) for (i in 1:length(t2p)) {
    ind <- (1:nt)[as.logical(vtab[t2p[i],])]
    t2p[i] <- ind # the term that smooth relates to
  }
  if (length(zp)>0) for (i in 1:length(zp)) {
    ind <- (1:nt)[as.logical(vtab[zp[i],])]
    zp[i] <- ind # the term that smooth relates to
  } ## re-referencing is complete

  k <- kt <- kti <- kt2 <- ks <- kz <- kp <- 1 # counters for terms in the 2 formulae
  len.sp <- length(sp)
  len.tp <- length(tp)
  len.tip <- length(tip)
  len.t2p <- length(t2p)
  len.zp <- length(zp)
  ns <- len.sp + len.tp + len.tip + len.t2p + len.zp# number of smooths
  pav <- av <- rep("",0)
  smooth.spec <- list()
  #mgcvat <- "package:mgcv" %in% search() ## is mgcv in search path?
  mgcvns <- loadNamespace('mgcv')
  if (nt) for (i in 1:nt) { # work through all terms
    if (k <= ns&&((ks<=len.sp&&sp[ks]==i)||(kt<=len.tp&&tp[kt]==i)||(kz<=len.zp&&zp[kz]==i)||
                  (kti<=len.tip&&tip[kti]==i)||(kt2<=len.t2p&&t2p[kt2]==i))) { # it's a smooth
      ## have to evaluate in the environment of the formula or you can't find variables 
      ## supplied as smooth arguments, e.g. k <- 5;gam(y~s(x,k=k)), fails,
      ## but if you don't specify namespace of mgcv then stuff like 
      ## loadNamespace('mgcv'); k <- 10; mgcv::interpret.gam(y~s(x,k=k)) fails (can't find s)
      ## eval(parse(text=terms[i]),envir=p.env,enclos=loadNamespace('mgcv')) fails??
      ## following may supply namespace of mgcv explicitly if not on search path...
      ## If 's' etc are masked then we can fail even if mgcv on search path, hence paste
      ## of explicit mgcv reference into first attempt...

      st <- try(eval(parse(text=paste("mgcv::",terms[i],sep="")),envir=p.env),silent=TRUE)
      if (inherits(st,"try-error")) st <- 
            eval(parse(text=terms[i]),enclos=p.env,envir=mgcvns)

      if (!is.null(textra)) { ## modify the labels on smooths with textra
        pos <- regexpr("(",st$lab,fixed=TRUE)[1]
        st$label <- paste(substr(st$label,start=1,stop=pos-1),textra,
                    substr(st$label,start=pos,stop=nchar(st$label)),sep="")
      }
      smooth.spec[[k]] <- st
      if (ks<=len.sp&&sp[ks]==i) ks <- ks + 1 else # counts s() terms
      if (kt<=len.tp&&tp[kt]==i) kt <- kt + 1 else # counts te() terms
      if (kti<=len.tip&&tip[kti]==i) kti <- kti + 1 else # counts ti() terms
      if (kt2<=len.t2p&&t2p[kt2]==i) kt2 <- kt2 + 1 # counts t2() terms
      else kz <- kz + 1
      k <- k + 1      # counts smooth terms 
    } else {          # parametric
      av[kp] <- terms[i] ## element kp on rhs of parametric
      kp <- kp+1    # counts parametric terms
    }
  }    
  if (!is.null(off)) { ## deal with offset 
    av[kp] <- as.character(attr(tf,"variables")[1+off])
    kp <- kp+1          
  }

  pf <- paste(response,"~",paste(av,collapse=" + "))
  if (attr(tf,"intercept")==0) {
    pf <- paste(pf,"-1",sep="")
    if (kp>1) pfok <- 1 else pfok <- 0
  } else { 
    pfok <- 1;if (kp==1) { 
      pf <- paste(pf,"1"); 
    }
  }

  fake.formula <- pf

  if (length(smooth.spec)>0) 
  for (i in 1:length(smooth.spec)) {
    nt <- length(smooth.spec[[i]]$term)
    ff1 <- paste(smooth.spec[[i]]$term[1:nt],collapse="+")
    fake.formula <- paste(fake.formula,"+",ff1)
    if (smooth.spec[[i]]$by!="NA") {
      fake.formula <- paste(fake.formula,"+",smooth.spec[[i]]$by)
      av <- c(av,smooth.spec[[i]]$term,smooth.spec[[i]]$by)
    } else av <- c(av,smooth.spec[[i]]$term)
  }
  fake.formula <- as.formula(fake.formula,p.env)
  if (length(av)) {
    pred.formula <- as.formula(paste("~",paste(av,collapse="+")))
    pav <- all.vars(pred.formula) ## trick to strip out 'offset(x)' etc...
    pred.formula <- reformulate(pav) 
  environment(pred.formula) <- environment(gf)
  } else  pred.formula <- ~1
  ret <- list(pf=as.formula(pf,p.env),pfok=pfok,smooth.spec=smooth.spec,
            fake.formula=fake.formula,response=response,fake.names=av,
            pred.names=pav,pred.formula=pred.formula)
  class(ret) <- "split.gam.formula"
  ret
} ## interpret.gam0

interpret.gam <- function(gf,extra.special=NULL) {
## wrapper to allow gf to be a list of formulae or 
## a single formula. This facilitates general penalized 
## likelihood models in which several linear predictors 
## may be involved...
##
## The list syntax is as follows. The first formula must have a response on
## the lhs, rather than labels. For m linear predictors, there 
## must be m 'base formulae' in linear predictor order. lhs labels will 
## be ignored in a base formula. Empty base formulae have '-1' on rhs.
## Further formulae have labels up to m labels 1,...,m on the lhs, in a 
## syntax like this: 3 + 5 ~ s(x), which indicates that the same s(x) 
## should be added to both linear predictors 3 and 5. 
## e.g. A bivariate normal model with common expected values might be
## list(y1~-1,y2~-1,1+2~s(x)), whereas if the second component was contaminated 
## by something else we might have list(y1~-1,y2~s(v)-1,1+2~s(x)) 
## 
## For a list argument, this routine returns a list of split.formula objects 
## with an extra field "lpi" indicating the linear predictors to which each 
## contributes...
  if (is.list(gf)) {
    d <- length(gf)

    ## make sure all formulae have a response, to avoid
    ## problems with parametric sub formulae of the form ~1
    #if (length(gf[[1]])<3) stop("first formula must specify a response")
    resp <- gf[[1]][2]

    ret <- list()
    pav <- av <- rep("",0)
    nlp <- 0 ## count number of linear predictors (may be different from number of formulae)
    for (i in 1:d) {
      textra <- if (i==1) NULL else paste(".",i-1,sep="") ## modify smooth labels to identify to predictor  

      lpi <- getNumericResponse(gf[[i]]) ## get linear predictors to which this applies, if explicit
      if (length(lpi)==1) warning("single linear predictor indices are ignored")
      if (length(lpi)>0) gf[[i]][[2]] <- NULL else { ## delete l.p. labels from formula response 
        nlp <- nlp + 1;lpi <- nlp ## this is base formula for l.p. number nlp       
      }
      ret[[i]] <- interpret.gam0(gf[[i]],textra,extra.special=extra.special)
      ret[[i]]$lpi <- lpi ## record of the linear predictors to which this applies

      ## make sure all parametric formulae have a response, to avoid
      ## problems with parametric sub formulae of the form ~1
      respi <- rep("",0) ## no extra response terms
      if (length(ret[[i]]$pf)==2) { 
        ret[[i]]$pf[3] <- ret[[i]]$pf[2];ret[[i]]$pf[2] <- resp
        respi <- rep("",0)
      } else if (i>1) respi <- ret[[i]]$response ## extra response terms
      av <- c(av,ret[[i]]$fake.names,respi) ## accumulate all required variable names 
      pav <- c(pav,ret[[i]]$pred.names) ## predictors only 
    } 
    av <- unique(av) ## strip out duplicate variable names
    pav <- unique(pav)
    if (length(av)>0) {
      ## work around - reformulate with response = "log(x)" will treat log(x) as a name,
      ## not the call it should be... 
      fff <- formula(paste(ret[[1]]$response,"~ ."))
      ret$fake.formula <- reformulate(av,response=ret[[1]]$response) 
      environment(ret$fake.formula) <- environment(gf[[1]]) 
      ret$fake.formula[[2]] <- fff[[2]] ## fix messed up response
    } else ret$fake.formula <- ret[[1]]$fake.formula ## create fake formula containing all variables
    ret$pred.formula <- if (length(pav)>0) reformulate(pav) else ~1 ## predictor only formula
    environment(ret$pred.formula) <- environment(gf[[1]])
    ret$response <- ret[[1]]$response 
    ret$nlp <- nlp ## number of linear predictors
    for (i in 1:d) if (max(ret[[i]]$lpi)>nlp||min(ret[[i]]$lpi)<1) stop("linear predictor labels out of range")
    class(ret) <- "split.gam.formula"
    return(ret)
  } else interpret.gam0(gf,extra.special=extra.special)  
} ## interpret.gam


## Now some test code.

environment(interpret.gam) <- environment(mgcv::interpret.gam)
environment(interpret.gam0) <- environment(mgcv:::interpret.gam0)
assignInNamespace("interpret.gam", interpret.gam, "mgcv")
assignInNamespace("interpret.gam0", interpret.gam0, "mgcv")

set.seed(123)

mod.gam.1 <- local({
    var.fun <- function(x){x^2}

    x <- runif(100)
    y <- x + rnorm(100, 0, var.fun(x))

    gam(formula = list(y ~ x,
                                         ~ var.fun(x)),
            family = gaulss(link = list("log", "logb")))
})

pred.fun.1 <- function(x){
    predict(mod.gam.1,
                  newdata = data.frame("x" = x))
}

pred.fun.1(1)

关于r - 使用包内的自定义方差函数从 gaulss-gams 进行预测时出现环境问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59615362/

r - 使用包内的自定义方差函数从 gaulss-gams 进行预测时出现环境问题

上一篇：ruby - RUBYOPT env 有哪些选项？

下一篇：android - Retrofit 2 不适用于 Android API 级别 17