r - 使用插入符号创建训练和测试数据时缺少值

标签 r statistics r-caret

我的问题是关于使用插入符号拟合模型时如何处理缺失值。 我的数据的一小部分样本如下:

       df <- dput(dat)
       structure(list(LagO3 = c(NA, NA, NA, 40, 45, NA), RH = c(69.4087524414062, 
       79.9608383178711, 64.4592437744141, 66.4207077026367, 66.0899200439453, 
       91.3353729248047), SR = c(298.928888888889, 300.128888888889, 
       303.688888888889, 304.521111111111, 303.223333333333, 294.716666666667
       ), ST = c(317.9917578125, 317.448253038194, 311.039059244792, 
       312.557927517361, 321.252841796875, 330.512212456597), Tmx = c(294.770359293045, 
       294.897191864461, 295.674552786042, 296.247345044048, 296.108238352818, 
       294.594430242372), CWTE = c(0, 1, 0, 0, 0, 0), CWTW = c(0, 0, 
       0, 0, 0, 0), o3 = c(NA, NA, NA, 52, 55, NA)), .Names = c("LagO3", 
       "RH", "SR", "ST", "Tmx", "CWTE", "CWTW", "o3"), row.names = c("1", 
       "2", "3", "4", "5", "6"), class = "data.frame")

问题是,对于我的一个预测变量中的多个位置,我有 NA,并且预测值 (o3) 也有 NA(但在不同的位置)。然后,我尝试了:

model <- train(x = na.omit(x.training), y = na.omit(training$o3), method = "lmStepAIC",
               direction="backward", trControl = control)

但是,我会对 y 有不同的长度...... 我尝试使用:

 model <- train(x = x.training, y = training$o3,na.action=na.pass, 
                method = "lmStepAIC",direction="backward",trControl = control)


Error in quantile.default(y, probs = seq(0, 1, length = cuts)) : missing values and NaN's not allowed if 'na.rm' is FALSE




您需要将 na.action 参数与 train 函数的 na.omit 结合使用。正如 na.action 的文档所述(类型 ?train):

A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)


model <- train(x = x.training, y = training$o3, 
              method = "lmStepAIC",direction="backward", 
              trControl = control, na.action=na.omit)


> model <- train(x = x.training, y = y.training, method = "lmStepAIC",direction="backward",
+                na.action=na.omit)
Start:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE + CWTW

Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE

Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx

Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST

Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR

Step:  AIC=-129.7
.outcome ~ LagO3 + RH

Step:  AIC=-129.7
.outcome ~ LagO3

Step:  AIC=-129.7
.outcome ~ 1

关于r - 使用插入符号创建训练和测试数据时缺少值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28831197/


