R插入符号在训练后重命名data.table中的列

标签 r data.table r-caret

由于某些原因,caret 包中的train 函数更改了响应变量的名称。这是一个玩具示例:

library(caret)
library(data.table)
DT <- data.table(x = rnorm(10), y = rnorm(10))
> DT
 #            x          y
 #1: -1.7844589  0.4834738
 #2: -0.3519577 -0.4644998
 #3:  1.0697762 -0.9183105
 #4: -0.2624022 -1.0952624
 #5: -1.0875959 -1.0267012
 #6:  0.1442927 -0.8669099
 #7:  0.3886957  0.2272433
 #8: -0.1625200  0.8286582
 #9: -0.5419324 -0.0526076
 #10:  0.4669790  0.2916581
cv.ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 1)
fit <- train(y ~ x, data = DT, 'lm', trControl = cv.ctrl)
> DT
 #            x   .outcome
 #1: -1.7844589  0.4834738
 #2: -0.3519577 -0.4644998
 #3:  1.0697762 -0.9183105
 #4: -0.2624022 -1.0952624
 #5: -1.0875959 -1.0267012
 #6:  0.1442927 -0.8669099
 #7:  0.3886957  0.2272433
 #8: -0.1625200  0.8286582
 #9: -0.5419324 -0.0526076
 #10:  0.4669790  0.2916581

我知道我可以在训练后重命名它,但如果我有很多模型要训练,它就会重复。这是正确的行为吗?

编辑:添加 session 信息

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-24     ggplot2_0.9.3.1  lattice_0.20-29  data.table_1.9.2

loaded via a namespace (and not attached):
 [1] car_2.0-19       codetools_0.2-8  colorspace_1.2-4 digest_0.6.4     foreach_1.4.2    grid_3.1.0       gtable_0.1.2    
 [8] iterators_1.0.7  MASS_7.3-31      munsell_0.4.2    nnet_7.3-8       plyr_1.8.1       proto_0.3-10     Rcpp_0.11.1     
[15] reshape2_1.2.2   scales_0.2.4     stringr_0.6.2    tools_3.1.0     

最佳答案

更新:这在当前的开发版本1.9.5中已经修复。来自 NEWS :

names<-.data.table works as intended on data.table unaware packages with Rv3.1.0+. Closes #476 and #825. Thanks to ezbentley for reporting here on SO and to @narrenfrei.


类似于@hrbrmstr的建议,你可以这样做

library(caret)
library(data.table)
DT <- data.table(x = rnorm(10), y = rnorm(10))
cv.ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 1)
fit <- train(y ~ x, data = as.data.frame(DT), 'lm', trControl = cv.ctrl)
DT
#              x           y
# 1: -0.06027817  1.32641243
# 2:  0.28842856  0.60240700
# 3:  1.14196056  0.97159637
# 4: -0.82907332  0.82955574
# 5:  0.73742357 -0.63901239
# 6:  0.12551649  1.33047527
# 7: -1.12110293 -0.03315772
# 8:  0.29933697 -1.52464998
# 9:  1.66046182  0.21068356
# 10: -0.09126467  2.02206078

这样你就不会丢失 data.table 类

关于R插入符号在训练后重命名data.table中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23256177/

相关文章:

Rcpp 中的 C++ 内置随机工件

r - 从数据框中删除特定的行

R 使用日期和字符列的特定函数将多行折叠为 1 行

r - R中data.table中的.SD列

r - 插入符中的多类分类

检索插入符号中保留折叠的预测

r - 在 R 中,自定义由 dcast.data.table 创建的列的名称

r - 解压列表的 R 数据框列

r - Predict() R 函数插入符包错误 : "newdata" rows different, "type"不接受

r - R 中的平滑曲线图