r 中一列函数的随机子集

标签 r

我想从数据框中随机提取 n 行，作为一列的函数。所以在这个例子中:

# Reproducible example
df <- as.data.frame(matrix(0,2e+6,2))
df$V1 <- runif(nrow(df),0,1)
df$V2 <- sample(c(1:10),nrow(df), replace=TRUE)
df$V3 <- sample(c("A","B","C"),nrow(df), replace=TRUE)

例如，我想为 V2 的每个值提取 n=10 行。

# Example of what I need with one value of V2
df1 <- df[which(df$V2==1),]
str(df1)
df1[sample(1:nrow(df1),10),]

我不想做任何for循环所以我用tapply尝试了这一行:

df_objective <- tapply(df$V1, df$V2, function(x) df[sample(1:nrow(df),10),"V2"])

这接近我想要的，但我丢失了数据框的第三列。

我尝试这样做以获得完整的子集:

df_objective <- by(cbind(df$V1,df$V3), df$V2, function(x) df[sample(1:nrow(df),10),"V2"])

但这没有帮助。

如何保留子集中的所有列？

最佳答案

听起来您只是在“dplyr”中寻找类似 sample_n 的内容:

library(dplyr)
df %>% group_by(V2) %>% sample_n(10)
# Source: local data frame [100 x 3]
# Groups: V2
# 
#            V1 V2 V3
# 1  0.51099392  1  B
# 2  0.87098866  1  A
# 3  0.13647752  1  B
# 4  0.15348834  1  B
# 5  0.94096127  1  B
# 6  0.05673849  1  A
# 7  0.69960842  1  C
# 8  0.02246671  1  C
# 9  0.88903430  1  B
# 10 0.52128253  1  A
# ..        ... .. ..

或者，我的“splitstackshape”包中有分层。

library(splitstackshape)
stratified(df, "V2", 10)

关于r 中一列函数的随机子集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30195683/

上一篇：date - 如何使用日期获得周数的正常定义

下一篇：c - 返回 C 中带有 const 数组的结构体

R编程使用 "dplyr"选择行并返回找到的行的索引

r - 将相似的图分组到条形图中

r - R 中的 cv.lars 错误

r - 在R中解析复杂的文本文件

windows - 在 Windows 上的 R 中读取带有 SUB (1a) (Control-Z) 字符的文本文件

r - 使用 dplyr 从不同的 data.frame 中提取数据？

r - ggplot2使用geom_line手动指定颜色

在 R studio 中成功运行应用程序，但在 Shiny 服务器中运行失败

r - R : getting rules with only one item in the left-hand side 中的封装规则