machine-learning - na.fail.default 中的随机森林错误 : missing values in object

标签 machine-learning random-forest r-caret feature-selection

我正在运行一个 RF 模型,该模型对于大多数变量都没有错误;但是,当我包含一个变量:duration_in_program 和以下代码时:

```{r Random Forest Model}
## Run a Random Forest model
mod_rf <-
  train(left_school ~ job_title 
        + gender + 
        + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
        + cityB +cityA + duration_in_program, # Equation (outcome and everything else)
        data=train_data, # Training data 
        method = "ranger", # random forest (ranger is much faster than rf)
        metric = "ROC", # area under the curve
        trControl = control_conditions,
        tuneGrid = tune_mtry
  )
mod_rf

我收到以下错误:

Error in na.fail.default(list(left_welfare = c(1L, 2L, 2L, 2L, 2L, 2L, : missing values in object

最佳答案

假设 train() 来自插入符号,您可以使用 na.action 参数指定一个函数来处理 na。默认值为 na.fail。一个非常常见的就是 na.omit。 randomForest 库有 na.roughfix ,它将“按中位数/众数估算缺失值”。

mod_rf <-
  train(left_school ~ job_title 
        + gender + 
        + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
        + cityB +cityA + duration_in_program, # Equation (outcome and everything else)
        data=train_data, # Training data 
        method = "ranger", # random forest (ranger is much faster than rf)
        metric = "ROC", # area under the curve
        trControl = control_conditions,
        tuneGrid = tune_mtry,
        na.action = na.omit
  )
mod_rf

关于machine-learning - na.fail.default 中的随机森林错误 : missing values in object,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59257544/

相关文章:

r - 在使用公式用插入符号的 train() 训练的 randomForest 对象上使用 predict() 时出错

python-3.x - ML 模型无法正确预测

python - 用于多输入图像的 VGG16 网络

machine-learning - 为什么 Weka 的实验器没有显示全部 10 次折叠的结果?

python - sklearn : How to reset a Regressor or classifier object in sknn

r - 带插入符的预测区间

matlab - TreeBagger() (MATLAB) 以及训练集和测试集上不同数量的变量

python - 使用 GridSearchCV 调整随机森林超参数 scikit-learn

scala - Spark : How to create categoricalFeaturesInfo for decision trees from LabeledPoint?

r - ConfusionMatrix 中的错误:数据和引用因子必须具有相同的级别数