我在 R 中有下面提到的数据框:
ID Unique_Id Date Status
I-1 UR-112 2020-01-01 14:15:16 Approved
I-2 UR-112 2020-02-12 14:15:16 In Process
I-3 UR-112 2020-03-23 14:15:16 In Process
I-4 UR-113 2020-01-01 14:15:16 Hold
I-5 UR-113 2020-04-11 14:15:16 Hold
I-6 UR-114 2020-04-07 14:15:16 Approved
I-7 UR-114 2020-05-08 14:15:16 Approved
I-8 UR-114 2020-05-09 14:15:16 In Process
I-9 UR-115 2020-01-18 14:15:16 Approved
I-10 UR-115 2020-03-23 14:15:16 Approved
I-11 UR-116 2020-02-11 14:15:16 Approved
我需要创建一个随机 3 个 Unique_Id 的子集,该子集分布在所有 Date
中,并且这三个 Unique_Id
必须位于可用的 Status
下>.
所需的输出<-
ID Unique_Id Date Status
I-1 UR-112 2020-01-01 14:15:16 Approved
I-2 UR-112 2020-02-12 14:15:16 In Process
I-3 UR-112 2020-03-23 14:15:16 In Process
I-4 UR-113 2020-01-01 14:15:16 Hold
I-5 UR-113 2020-04-11 14:15:16 Hold
I-11 UR-116 2020-02-11 14:15:16 Approved
最佳答案
也许使用如下循环:
id <- character(0)
while(length(id) != 3) {
id <- character(0)
for(i in unique(x$Status)) {id <-
c(id, sample(setdiff(x$Unique_Id[x$Status == i], id), 1))}
}
x[x$Unique_Id %in% id,]
# ID Unique_Id Date Status
#4 I-4 UR-113 2020-01-01 14:15:16 Hold
#5 I-5 UR-113 2020-04-11 14:15:16 Hold
#6 I-6 UR-114 2020-04-07 14:15:16 Approved
#7 I-7 UR-114 2020-05-08 14:15:16 Approved
#8 I-8 UR-114 2020-05-09 14:15:16 In Process
#9 I-9 UR-115 2020-01-18 14:15:16 Approved
#10 I-10 UR-115 2020-03-23 14:15:16 Approved
数据:
x <- structure(list(ID = c("I-1", "I-2", "I-3", "I-4", "I-5", "I-6",
"I-7", "I-8", "I-9", "I-10", "I-11"), Unique_Id = c("UR-112",
"UR-112", "UR-112", "UR-113", "UR-113", "UR-114", "UR-114", "UR-114",
"UR-115", "UR-115", "UR-116"), Date = c("2020-01-01 14:15:16",
"2020-02-12 14:15:16", "2020-03-23 14:15:16", "2020-01-01 14:15:16",
"2020-04-11 14:15:16", "2020-04-07 14:15:16", "2020-05-08 14:15:16",
"2020-05-09 14:15:16", "2020-01-18 14:15:16", "2020-03-23 14:15:16",
"2020-02-11 14:15:16"), Status = c("Approved", "In Process",
"In Process", "Hold", "Hold", "Approved", "Approved", "In Process",
"Approved", "Approved", "Approved")), class = "data.frame", row.names = c(NA,
-11L))
关于r - 如何在 R 中创建具有相等随机分布的数据子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67089986/