r - Tidymodels(使用 fit_samples() 拟合随机森林) : Fold01: internal: Error: Must group by variables found in `.data`

概述

我已经生成了一个随机森林回归模型，并且我的目标是使用函数fit_samples()来拟合模型，然后调整超参数。但是，我遇到以下错误消息:

错误消息:

! Fold01: model: tune columns were requested but there were 14 predictors in the data. 14 will be u... x Fold01: internal: Error: Must group by variables found in `.data`. * Column `mtry` is not found. ! Fold02: model: tune columns were requested but there were 14 predictors in the data. 14 will be u... x Fold02: internal: Error: Must group by variables found in `.data`. * Column `mtry` is not found. ! Fold03: model: tune columns were requested but there were 14 predictors in the data. 14 will be u... x Fold03: internal: Error: Must group by variables found in `.data`. * Column `mtry` is not found.

我已在线搜索解决方案，但找不到与我的特定问题相符的问题。我不是高级 R 用户，我正在尽力通过 Tidymodels 包慢慢地调整自己

如果有人可以帮助解决此错误消息，我将不胜感激。

提前非常感谢

R 代码

seed(45L) #Open libraries library(tidymodels) library(ranger) library(dplyr) #split this single dataset into two: a training set and a testing set data_split <- initial_split(FID) #Create data frames for the two sets: train_data <- training(data_split) test_data <- testing(data_split) #resample the data with 10-fold cross-validation (10-fold by default) cv <- vfold_cv(train_data, v=10) ########################################################### ##Produce the recipe rec <- recipe(Frequency ~ ., data = FID) %>% step_nzv(all_predictors(), freq_cut = 0, unique_cut = 0) %>% # remove variables with zero variances step_novel(all_nominal()) %>% # prepares test data to handle previously unseen factor levels step_medianimpute(all_numeric(), -all_outcomes(), -has_role("id vars")) %>% # replaces missing numeric observations with the median step_dummy(all_nominal(), -has_role("id vars")) # dummy codes categorical variables #Produce the random forest model mod_rf <- rand_forest( mtry = tune(), trees = 1000, min_n = tune() ) %>% set_mode("regression") %>% set_engine("ranger") ##Workflow wflow_rf <- workflow() %>% add_model(mod_rf) %>% add_recipe(rec) ##Fit model plan(multisession) fit_rf<-fit_resamples( wflow_rf, cv, metrics = metric_set(rmse, rsq), control = control_resamples(save_pred = TRUE, extract = function(x) extract_model(x))) #Error Message ! Fold01: model: tune columns were requested but there were 14 predictors in the data. 14 will be u... x Fold01: internal: Error: Must group by variables found in `.data`. * Column `mtry` is not found. ! Fold02: model: tune columns were requested but there were 14 predictors in the data. 14 will be u... x Fold02: internal: Error: Must group by variables found in `.data`. * Column `mtry` is not found. ! Fold03: model: tune columns were requested but there were 14 predictors in the data. 14 will be u... x Fold03: internal: Error: Must group by variables found in `.data`. * Column `mtry` is not found.

数据框FID

structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"), class = "factor"), Frequency = c(36, 28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9, 7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27, 43, 38), Days = c(31, 28, 31, 30, 6, 0, 0, 29, 15, 29, 29, 31, 31, 29, 30, 30, 7, 0, 7, 30, 30, 31, 30, 27, 31, 28, 30, 30, 21, 0, 7, 26, 29, 27, 29, 29)), row.names = c(NA, -36L), class = "data.frame")

最佳答案

如果您查看 fit_resamples 的帮助页面:

fit_resamples() computes a set of performance metrics across one or more resamples. It does not perform any tuning (see tune_grid() and tune_bayes() for that)

很可能您需要先进行调整，然后使用从调整中获得的参数运行 fit_resamples()，例如:

rf_grid <- expand.grid(mtry = 2:4, min_n = c(10,15,20)) mod_rf <- rand_forest( mtry = tune(), trees = 1000, min_n = tune() ) %>% set_mode("regression") %>% set_engine("ranger") wflow_rf <- workflow() %>% add_model(mod_rf) %>% add_recipe(rec) rf_res <- wflow_rf %>% tune_grid( resamples = cv,grid = rf_grid )

找到最佳参数:

show_best(rf_res,metric="rmse") # A tibble: 5 x 7 mtry min_n .metric .estimator mean n std_err <int> <dbl> <chr> <chr> <dbl> <int> <dbl> 1 4 10 rmse standard 7.87 10 0.743 2 4 15 rmse standard 8.27 10 0.649 3 3 10 rmse standard 8.49 10 0.682 4 3 15 rmse standard 8.97 10 0.620 5 4 20 rmse standard 9.49 10 0.605

然后再次运行:

mod_rf <- rand_forest(mtry = 4,trees = 1000,min_n = 10) %>% set_mode("regression") %>% set_engine("ranger") wflow_rf <- workflow() %>% add_model(mod_rf) %>% add_recipe(rec) fit_rf<-fit_resamples( wflow_rf, cv, metrics = metric_set(rmse, rsq), control = control_resamples(save_pred = TRUE, extract = function(x) extract_model(x)))

关于r - Tidymodels(使用 fit_samples() 拟合随机森林) : Fold01: internal: Error: Must group by variables found in `.data` ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65350163/

r - Tidymodels(使用 fit_samples() 拟合随机森林) : Fold01: internal: Error: Must group by variables found in `.data`

上一篇：wolfram-mathematica - Mathematica中是否有类似于Maple的 "lhs"和 "rhs"函数？

下一篇：java - 我如何替换replaceAll中符号后的字母/数字