r - 使用插入符号创建训练和测试数据时缺少值

标签 r statistics r-caret

我的问题是关于使用插入符号拟合模型时如何处理缺失值。 我的数据的一小部分样本如下:

       df <- dput(dat)
       structure(list(LagO3 = c(NA, NA, NA, 40, 45, NA), RH = c(69.4087524414062, 
       79.9608383178711, 64.4592437744141, 66.4207077026367, 66.0899200439453, 
       91.3353729248047), SR = c(298.928888888889, 300.128888888889, 
       303.688888888889, 304.521111111111, 303.223333333333, 294.716666666667
       ), ST = c(317.9917578125, 317.448253038194, 311.039059244792, 
       312.557927517361, 321.252841796875, 330.512212456597), Tmx = c(294.770359293045, 
       294.897191864461, 295.674552786042, 296.247345044048, 296.108238352818, 
       294.594430242372), CWTE = c(0, 1, 0, 0, 0, 0), CWTW = c(0, 0, 
       0, 0, 0, 0), o3 = c(NA, NA, NA, 52, 55, NA)), .Names = c("LagO3", 
       "RH", "SR", "ST", "Tmx", "CWTE", "CWTW", "o3"), row.names = c("1", 
       "2", "3", "4", "5", "6"), class = "data.frame")

问题是,对于我的一个预测变量中的多个位置,我有 NA,并且预测值 (o3) 也有 NA(但在不同的位置)。然后,我尝试了:

model <- train(x = na.omit(x.training), y = na.omit(training$o3), method = "lmStepAIC",
               direction="backward", trControl = control)

但是,我会对 y 有不同的长度...... 我尝试使用:

 model <- train(x = x.training, y = training$o3,na.action=na.pass, 
                method = "lmStepAIC",direction="backward",trControl = control)

出现以下错误:

Error in quantile.default(y, probs = seq(0, 1, length = cuts)) : missing values and NaN's not allowed if 'na.rm' is FALSE

如果有任何建议,我将不胜感激!

非常感谢。

最佳答案

您需要将 na.action 参数与 train 函数的 na.omit 结合使用。正如 na.action 的文档所述(类型 ?train):

A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

因此以下内容将起作用:

model <- train(x = x.training, y = training$o3, 
              method = "lmStepAIC",direction="backward", 
              trControl = control, na.action=na.omit)

输出:

> model <- train(x = x.training, y = y.training, method = "lmStepAIC",direction="backward",
+                na.action=na.omit)
Start:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE + CWTW


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR


Step:  AIC=-129.7
.outcome ~ LagO3 + RH


Step:  AIC=-129.7
.outcome ~ LagO3


Step:  AIC=-129.7
.outcome ~ 1
...
...
...

关于r - 使用插入符号创建训练和测试数据时缺少值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28831197/

相关文章:

r - 如何使用 ggplot2 生成逆累积直方图

c - c中的逆累积分布函数?

python - Python 中的骰子统计

R 将多个虚拟变量列合并为 1

R,迭代矩阵的行向量

按另一个变量分组的 r 数据表中的排名值

r - 用 R 对表格进行排序

python - 没有 SciPy 的 PDF 和 CDF

缺少类别的 R 包插入符号混淆矩阵

r - 在插入符号 : Error for class probabilities 中使用 eml