loops - 重复 K 折交叉验证的循环

标签 loops cross-validation

我正在尝试编写一个循环来进行重复的 k 倍交叉验证。基本上尝试执行 10 倍交叉验证并重复该过程 10 次以获得预测和生成的 10 个 AUC 值。

我似乎在循环中缺少一些东西,这些东西允许将计算出的预测移动到为 k 倍结果创建的空数据帧的相应列。我只得到输出中的最后 k 倍分数...而不是全部 10 个。我仍然必须获取每个 k 倍验证的 auc 值。

有没有办法将 auc 计算合并到循环中来获取值?如果有人可以指导我,我将不胜感激。


library(cvTools)
library (glmnet)
#library(pROC) 

k <- 10 #the number of folds
x <- structure(list(PC1 = c(-2.03456672313651, -1.73707505007147, 
-2.03456672313652, -0.255368300655119, -1.73707505007143, -2.03456672313651, 
-0.37500359723752, -2.03456672313651, -2.03456672313651, 3.47288460329945, 
-0.734187869112349, -0.0134149056651377, 0.0942929078885968, 
-2.0345667231365, -2.03456672313651), PC2 = c(0.112471741011579, 
0.133858302549922, 0.1124717410116, 2.61374131070885, 0.133858302549994, 
0.11247174101158, -0.158995891265301, 0.11247174101159, 0.112471741011592, 
-0.260528749768208, -0.503925189558291, 0.194756984230433, 0.318778158034713, 
0.112471741011598, 0.11247174101159), PC3 = c(2.44850389170835, 
2.3403087394181, 2.44850389170835, -2.46949441441314, 2.34030873941815, 
2.44850389170834, 0.123937826076267, 2.44850389170836, 2.44850389170835, 
-0.367483430521022, -0.155846438581532, 0.509441984698824, 0.612816030555617, 
2.44850389170836, 2.44850389170835), PC4 = c(0.112471741011652, 
0.133858302549981, 0.11247174101165, 0.00436673840662417, 0.133858302549995, 
0.112471741011656, -0.158995891265306, 0.112471741011666, 0.112471741011661, 
-0.260528749768211, -0.290253126970872, -2.28110627358792, 0.318778158034689, 
0.11247174101168, 0.11247174101167), PC5 = c(0.112471741011684, 
0.13385830255004, 0.112471741011692, 0.00436673840662224, 0.133858302550053, 
0.112471741011681, -0.158995891265284, 0.112471741011697, 0.112471741011696, 
-0.260528749768212, 1.20999715739728, -1.91404159432553, 0.318778158034758, 
0.112471741011709, 0.112471741011692)), .Names = c("PC1", "PC2", 
"PC3", "PC4", "PC5"), row.names = c("O35245", "O35286", "O54949", 
"O54991", "O88569", "P14733", "P16054", "P21619", "P24369", "P37889", 
"P40201", "P57080", "P60843", "P63085", "P99029"), class = "data.frame")

folds <- cvFolds(NROW(x), K=k)

mypreds <- data.frame(matrix(0, nrow(x),ncol = 10)) # create a dataframe to store results of all 10 k-fold repetititions
row.names(mypreds) <- row.names(x) # row names for the dataframe
names(mypreds) <- paste("K", (1:10), sep = "") # column names

set.seed(123)

j <- 1
nsim = 10 # number of repetitions 

x$kfoldlpred <- rep(0,nrow(x)) # append a column to original dataframe to temporarily store results of each k-fold

# the loop for repeated cross-validation
repeatcv <- function(){
  while (j <= nsim){
    for(i in 1:k){
      train <- x[folds$subsets[folds$which != i], ] #Set the training set 
      train_response <- responseY1[folds$subsets[folds$which != i]] # set the training set response

      validation <- x[folds$subsets[folds$which == i], ] #Set the validation set

      lasso_newglm <- glmnet(as.matrix(train), train_response, alpha = 1,family = "binomial") #Get your new logistic regression model (just fit on the train data)
      lasso_cvglm <- cv.glmnet(as.matrix(train), train_response, alpha = 1, family = "binomial",type.measure = "deviance")
      lasso_newpred <- predict(lasso_newglm,newx = as.matrix(validation), type = "response", s = c(lasso_cvglm$lambda.min)) #Get the predicitons for the validation set (from the model just fit on the train data)

      x[folds$subsets[folds$which == i],]$kfoldlpred <- lasso_newpred
    }
    mypreds[,i] <- x$kfoldlpred
    j <- j+1
  }
  return(mypreds)
}

最佳答案

caret 包提供开箱即用的重复交叉验证。这是一个最小的工作示例:

library(caret)
model <- train(x = iris[51:150,1:2], 
               y = factor(iris[51:150,5]), 
               method = 'glmnet', 
               metric='ROC', 
               trControl = trainControl(method = 'repeatedcv', # repeated cross validation
                                        number = 10, # nr of partitions
                                        repeats = 10, # nr of repeats
                                        classProbs = T, 
                                        summaryFunction = twoClassSummary))

model$resample 为您提供所有分区和重复的 AUC(10 个分区和 10 个重复,其 10*10=100 值):

> model$resample
     ROC Sens Spec     Resample
1   0.90  0.8  0.8 Fold05.Rep10
2   0.98  1.0  0.8 Fold04.Rep10
3   0.80  1.0  0.2 Fold01.Rep09
4   0.64  0.4  0.8 Fold08.Rep07
5   0.86  0.8  0.8 Fold05.Rep06
[...]

顺便说一句:如果您还想在所有分区和重复上绘制 ROC 曲线,请参阅 this question

关于loops - 重复 K 折交叉验证的循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37557985/

相关文章:

machine-learning - 在 sklearn 中使用支持向量机时如何实际使用验证集

algorithm - 基于公式消除两个表之间的观察

java - 为什么这个循环过程需要这么长时间?

r - for 循环与 cor.test 在许多类别上

python - scikit-learn 的多级并行化

r - 交叉验证和提前停止

algorithm - 生成和算法

Python - 更好的循环解决方案 - 出现错误后重新运行并在 3 次尝试后忽略该错误

python - 速度改进以在大型数据集中排除一组

python - 没有提供任何参数时,LassoCV 中的参数选择如何工作?