r - 我在姬松茸数据集中进行网格搜索时遇到问题

标签 r machine-learning r-caret

这是我的代码。

library(dplyr)
library(caret)
library(xgboost)

data(agaricus.train, package = "xgboost")
data(agaricus.test, package='xgboost')
train <- agaricus.train
test  <- agaricus.test



xgb_grid_1 <- expand.grid(
  nrounds = c(1:10),
  eta = c(seq(0,1,0.1)),
  max_depth = c(2:5),
  gamman = c(seq(0,1,0.1))
)


xgb_trcontrol_1 <- trainControl(
  method = "cv",
  number = 5,
  verboseIter = TRUE,
  returnData = FALSE,
  returnResamp = "all",                                                        
  classProbs = TRUE,                                                           
  summaryFunction = twoClassSummary,
  allowParallel = TRUE
)


xgb_train1 <- train(
  x = as.matrix(train$data),
  y = train$label,
  trControl = xgb_trcontrol_1,
  tune_grid = xgb_grid_1,
  method = 'xgbTree'
)  

编译xgb_train1时,出现messafe错误

Error in frankv(predicted) : x is a list, 'cols' can not be 0-length
In addition: Warning messages:
1: In train.default(x = train$data, y = train$label, trControl = xgb_trcontrol_1,  :
  You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column.
2: In train.default(x = train$data, y = train$label, trControl = xgb_trcontrol_1,  :
  cannnot compute class probabilities for regression

我该怎么办?请通知我

最佳答案

您的代码存在几个问题。

  1. 指定正确的参数名称

caret::train 没有 tune_grid 参数,而是 tuneGrid

  • 您正在尝试执行分类,但您提供的是数字目标。这是错误消息告诉您的内容:
  • 您正在尝试进行回归,但您的结果只有两个可能的值 您正在尝试进行分类吗?如果是这样,请使用 2 级因子作为结果列。

  • 在此处发布最少示例时,请尝试将计算时间限制为尽可能低。在您的示例中,只需减少搜索空间即可轻松实现这一点。
  • 这是应该可以工作的代码:

    library(caret)
    library(xgboost)
    
    data(agaricus.train, package = "xgboost")
    data(agaricus.test, package='xgboost')
    train <- agaricus.train
    test  <- agaricus.test
    
    train$label <- ifelse(train$label == 0, "no", "yes") #convert target to character or factor
    
    xgb_grid_1 = expand.grid(
      nrounds = 100,
      eta = c(0.01, 0.001, 0.0001),
      max_depth = c(2, 4, 6, 8, 10),
      gamma = 1,
      colsample_bytree = 0.6,
      min_child_weight = 1,
      subsample = 0.75
    )
    
    xgb_trcontrol_1 <- trainControl(
      method = "cv",
      number = 3,
      search = "grid",
      verboseIter = TRUE,
      returnData = FALSE,
      returnResamp = "all",                                                        
      classProbs = TRUE,                                                           
      summaryFunction = twoClassSummary
    )
    
    xgb_train1 <- caret::train(
      x = as.matrix(train$data),
      y = train$label,
      trControl = xgb_trcontrol_1,
      tuneGrid  = xgb_grid_1,
      metric ="ROC",
      method = 'xgbTree'
    )  
    
    #output
    
    eXtreme Gradient Boosting 
    
    No pre-processing
    Resampling: Cross-Validated (3 fold) 
    Summary of sample sizes: 4343, 4341, 4342 
    Resampling results across tuning parameters:
    
      eta    max_depth  ROC        Sens       Spec     
      1e-04   2         0.9963189  0.9780604  0.9656045
      1e-04   4         0.9999604  0.9985172  0.9974527
      1e-04   6         1.0000000  1.0000000  0.9974527
      1e-04   8         1.0000000  1.0000000  0.9974527
      1e-04  10         1.0000000  1.0000000  0.9974527
      1e-03   2         0.9972687  0.9629358  0.9713391
      1e-03   4         0.9999479  0.9985172  0.9974527
      1e-03   6         1.0000000  1.0000000  0.9974527
      1e-03   8         1.0000000  1.0000000  0.9974527
      1e-03  10         1.0000000  1.0000000  0.9977714
      1e-02   2         0.9990705  0.9780604  0.9757951
      1e-02   4         0.9999674  1.0000000  0.9974527
      1e-02   6         1.0000000  1.0000000  0.9977714
      1e-02   8         1.0000000  1.0000000  0.9977714
      1e-02  10         1.0000000  1.0000000  0.9977714
    
    Tuning parameter 'nrounds' was held constant at a value of 100
    Tuning parameter 'gamma' was held constant at a value of 1
    Tuning
     parameter 'colsample_bytree' was held constant at a value of 0.6
    Tuning parameter 'min_child_weight' was held constant at a value of
     1
    Tuning parameter 'subsample' was held constant at a value of 0.75
    ROC was used to select the optimal model using the largest value.
    The final values used for the model were nrounds = 100, max_depth = 6, eta = 1e-04, gamma = 1, colsample_bytree = 0.6, min_child_weight
     = 1 and subsample = 0.75.
    

    关于r - 我在姬松茸数据集中进行网格搜索时遇到问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53183734/

    相关文章:

    r - 在使用公式用插入符号的 train() 训练的 randomForest 对象上使用 predict() 时出错

    r - 如何访问属性[例如R 中 terra 栅格的时间]?

    r - 如何绘制 lme 模型的各个轨迹

    windows - 使用 paste() 在 R 中构造窗口路径

    r - 使用 ggdendro 在树状图的片段下显示变量标签

    python - 从特定 URL 下载 CSV 文件

    r - 插入符号错误 : "all the Accuracy metric values are missing"

    machine-learning - tensorflow 初始阶段中预测标签图像无法加载计算图

    python - 如何将数据集的示例加载到不同的数组中以进行决策树分类?

    r - 使用插入符号中的提升从两种不同的算法绘制 ROC 曲线