r - 计算R中的内核岭回归以进行模型选择

标签 r regression model-comparison

我有一个数据框df

df<-structure(list(P = c(794.102395099402, 1299.01021921817, 1219.80731174175, 
1403.00786976395, 742.749487463385, 340.246973543409, 90.3220586792255, 
195.85557320714, 199.390867672674, 191.4970921278, 334.452413539092, 
251.730350291822, 235.899165861309, 442.969718728163, 471.120193046119, 
458.464154601097, 950.298132134912, 454.660729622624, 591.212003320456, 
546.188716055825, 976.994105334083, 1021.67000560164, 945.965200876724, 
932.324768081307, 3112.60002304117, 624.005047807736, 0, 937.509240627289, 
892.926195849975, 598.564015734103, 907.984807726741, 363.400837339461, 
817.629824627294, 2493.75851182081, 451.149000503123, 1028.41455932241, 
615.640039284434, 688.915621065535, NaN, 988.21297, NaN, 394.7, 
277.7, 277.7, 492.7, 823.6, 1539.1, 556.4, 556.4, 556.4), T = c(11.7087701201175, 
8.38748953516909, 9.07065637842101, 9.96978059247473, 2.87026334756687, 
-1.20497751697385, 1.69057148825093, 2.79168506923385, -1.03659741363293, 
-2.44619473778322, -1.0414166493637, -0.0616510891024765, -2.19566614081763, 
2.101408628412, 1.30197334094966, 1.38963309876057, 1.11283280896495, 
0.570385633957982, 1.05118063842584, 0.816991857384802, 8.95069454902333, 
6.41067954598958, 8.42110173395973, 13.6455092557636, 25.706509843239, 
15.5098014530832, 6.60783204117648, 6.27004335176393, 10.0769600264915, 
3.05237224011361, 7.52869186722913, 11.2970127691776, 6.60356510073103, 
7.3210245298803, 8.4723724171517, 21.6988324356057, 7.34952593890056, 
6.04325232771032, NaN, 25.990913731, NaN, 1.5416666667, 15.1416666667, 
15.1416666667, 0.825, 4.3666666667, 7.225, -2.075, -2.075, -2.075
), A = c(76.6, 52.5, 3.5, 15, 71.5, 161.833333333333, 154, 72.5, 
39, 40, 23, 14.5, 5.5, 78, 129, 73.5, 100, 10, 3, 29.5, 65, 44, 
68.5, 56.5, 101, 52.1428571428571, 66.5, 1, 106, 36.6, 21.2, 
10, 135, 46.5, 17.5, 35.5, 86, 70.5, 65, 97, 30.5, 96, 79, 11, 
162, 350, 42, 200, 50, 250), Y = c(1135.40733061247, 2232.28817154825, 
682.15711101488, 1205.97307573068, 1004.2559099408, 656.537378609781, 
520.796355544007, 437.780508459633, 449.167726897157, 256.552344558528, 
585.618137514404, 299.815636674633, 230.279491515383, 1051.74875971674, 
801.07750760983, 572.337961145761, 666.132923644351, 373.524159859929, 
128.198042456082, 528.555426408071, 1077.30188477292, 1529.43757814094, 
1802.78658590423, 1289.80342084379, 3703.38329098125, 1834.54460388103, 
1087.48954802548, 613.15010408836, 1750.11457900004, 704.123482171384, 
1710.60321283154, 326.663507855032, 1468.32489464969, 1233.05517321796, 
852.500007182098, 1246.5605930537, 1186.31346316832, 1460.48566379373, 
2770, 3630, 3225, 831, 734, 387, 548.8, 1144, 1055, 911, 727, 
777)), .Names = c("P", "T", "A", "Y"), row.names = c(NA, -50L
), class = "data.frame")

我想通过使用内核岭回归进行模型选择。我已经通过简单的逐步回归分析(见下文)完成了此操作,但现在我想使用内核岭回归进行此操作。
 library(caret)
    Step <- train(Y~ P+T+A, data=df,
                               preProcess= c("center", "scale"),
                               method = "lmStepAIC",
                               trainControl(method="cv",repeats = 10), na.rm=T)

有人知道如何计算模型选择的核岭回归吗?

最佳答案

使用etienne链接的CVST包,这是使用Kernel Ridge回归学习器进行训练和预测的方法:

library(CVST)

## Assuming df is already in your environment
d = constructData(x=df[,1:3], y=df$Y) ## Structure data in CVST format
krr_learner = constructKRRLearner()   ## Build the base learner
params = list(kernel='rbfdot', sigma=100, lambda=0.01) ## Function params; documentation defines lambda as '.1/getN(d)'

krr_trained = krr_learner$learn(d, params)

## Now to predict, format your test data, 'dTest', the same way as you did in 'd'
pred = krr_learner$predict(krr_trained, dTest)

使CVST有点痛苦的是中间数据准备步骤,该步骤要求您调用constructData函数。这是the documentation中第7页的改编示例。

值得一提的是,当我在您的示例上运行此代码时,收到以下奇异警告:
Lapack routine dgesv: system is exactly singular: U[1,1] = 0

关于r - 计算R中的内核岭回归以进行模型选择,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33416799/

相关文章:

r - R 中的线性回归梯度下降算法会产生不同的结果

r - 数据框中变量之间的快速成对简单线性回归

r - R 中的树 : regression vs classification

machine-learning - 优化 word2vec 模型比较

r - 寻找 R 中 AIC 最低的模型(从 for 循环返回)

r - R 中的模型选择,所有模型都提供相同的 AIC 和 BIC

r - 根据 R 中的下一个值计算一个数字条件

r - 如何创建具有指定级别和标签的因子,逐步更改级别和调整标签

apache-spark - Spark ML 随机森林和梯度增强树用于回归

r - 在 Shiny 应用程序中旋转 3D 散点图