带有 as.factor() : any way to specify the argument as a variable content instead of directly by name? 的 R 公式

标签 r

为了下面的讨论，我将创建这个假训练数据框:

> dataset = data.frame(result=c("yes","yes","no","no","no"),
                       s1=seq(0,8,2), s2=seq(1,9,2))
> dataset
  result s1 s2
1    yes  0  1
2    yes  2  3
3     no  4  5
4     no  6  7
5     no  8  9
>

我正在尝试从类似于上图所示的多个数据帧中训练多个 kernlab KSVM 模型。 result对于每个数据帧，列实际上被命名为不同的(它根据使用该数据集训练的模型应该预测的内容命名)。

我对 R 还是很陌生，所以我使用的语法只是在我从 Rattle 的日志选项卡中剪切和粘贴的代码之后建模的(没有双关语意):

trainedModel = ksvm(as.factor(result) ~ ., data=dataset[,c(input, target), ...)

...哪里result是 dataset 中列的名称数据框。我明白 as.factor(result) ~ .是一个公式，这意味着 ~ 左边的东西不知何故源自 ~ 右侧的东西，还有 .仅表示“~ 左侧未指定的所有其他内容”。至少我认为这就是它的意思。

我的问题是我希望能够以编程方式创建和训练这些模型，并且输入数据集中的目标列的名称会发生变化。

如何在代码as.factor(result)中指定“colnames(dataset)[1]”(即动态确定的列名，在编码时不知道列名) ?

最佳答案

?as.formula , 允许您使用 paste 构建公式.将这些放在一起，您可以创建基于变量的公式，例如:

as.formula(paste("as.factor(",result_column,") ~ ."))

关于带有 as.factor() : any way to specify the argument as a variable content instead of directly by name? 的 R 公式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29981322/

上一篇：r - 如何在R中删除图中的顶点？

下一篇：具有指向同一位置的更多指针的 C 内存管理

Rstudio 不启动 "Unable to determine real path of R script"由于 R 编译期间的先前错误

r - 将csv表导入R并出现多个错误

r - 在 lapply 中增加多个参数

r - ODBC 在 MS Excel 中工作正常，但在 R 中无效

python - 运行 Rscript 并向其传递一个字符串而不是文件

r - order data.table along numeric column puttint special Value (residual category) of other column 最后

python - 为什么我的 python 代码比 R 代码慢得多

r - 使用 dplyr 的 mutate 创建函数

r - 情节(r): Unable to apply correct colors to 3D scatter and show legend at the same time