r - 将数据集划分为 60%、20%、20%

标签 r validation split dataset training-data

我正在尝试从 2 组数据移动到 3 组数据，如上述问题中所述。以下是我使用的脚本:

set.seed(125)
d <- sample(x = nrow(db), size = nrow(db) * 0.60, )
train60 <-db[d, ]
valid40 <-db[-d, ]

有没有办法修改上面的脚本？我试图创建另一行:
valid40 <- db[-d] * 0.2这不起作用。

当前数据集有几个因子变量。

我试过使用 Frank's solution here在 cut功能，但不知何故我设法得到

Error in cut.default(seq(nrow(df)), nrow(df) * cumsum(c(0, spec)), labels = names(spec)) : lengths of 'breaks' and 'labels' differ

即使在网上搜索帮助后我也不明白。

最佳答案

如果我理解正确，那么您需要不重复的样本的 60%、20% 和 20% 的 fork 。我以虹膜数据为例，其中包含 150 行和 5 列。

samp <- sample(1:nrow(iris),.6*nrow(iris)) ##60 and 40 bifurcation

train60 <- iris[samp,] ## This is the 60% chunk
remain40 <- iris[-samp,]  ## This is used for further bifurcation

samp2 <- sample(1:nrow(remain40),.5*nrow(remain40))

first20 <- remain40[samp2,] ## First chunk of 20%
secnd20 <- remain40[-samp2,] ## Second Chunk of 20%

Reduce("intersect",list(train60,first20,secnd20)) ##Check to find if there is any intersect , 0 rows means everything is fine and sample are not repetitive.

关于r - 将数据集划分为 60%、20%、20%，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44131087/

上一篇：r - 相当于 lapply(a, function(x) x[,1])

下一篇：aws-lambda - 通过 Terraform 创建 CloudFront 分配时出现 InvalidLambdaFunctionAssociation

相关文章：

java - 联系电话验证

Java do-while 与多个字符串验证

javascript - 在 JavaScript 中拆分 Date 字符串的更好方法是什么？

r - 从命名空间调用函数

r - gridExtra 2.0.0 更改标题大小

r - 为什么 sub() 和 gsub() 有相同的结果？

根据条件对变量进行排名

javascript - 如何在 asp.net mvc 中的客户端验证后运行脚本

postgresql - HAProxy 拆分读/写 postgresql

python - 按大小限制拆分大文件，无需剪线