假设我想从任意数量的不同概率分布中抽取可变长度值的样本,并从每个分布中抽取加权概率。
似乎我应该可以使用 purrr
来做到这一点的 map
功能,但正在挣扎...
library(tidyverse)
set.seed(20171127)
# sample from 5 different probability distributions
dists <- tibble(
samp_distA = round(rnorm(n=1000, mean=17, sd=4)),
samp_distB = round(rnorm(n=1000, mean=13, sd=4)),
samp_distC = round(rnorm(n=1000, mean=13, sd=4)),
samp_distD = round(rbeta(n=1000, 2,8)*10),
samp_distE = round(rnorm(n=1000, mean=8, sd=3))
)
# define number of samples to be drawn for each group
n.times <- c(20,15,35,8,6)
# define weights to be used for sampling from dists
probs <- tibble(A = c(0.80, 0.05, 0.05, 0.05, 0.05),
B = c(0.05, 0.80, 0.05, 0.05, 0.05),
C = c(0.05, 0.05, 0.80, 0.05, 0.05),
D = c(0.05, 0.05, 0.05, 0.80, 0.80),
E = c(0.05, 0.05, 0.05, 0.05, 0.80)
)
# sample from dists, n.times, and using probs as weights...
output <- map2(sample, size=n.times, weight=probs, tbl=dists)
#...doesn't work
任何建议都非常感谢。
最佳答案
set.seed(123)
map2(
n.times,
map(probs, rep, each = nrow(dists)),
sample, x = flatten_dbl(dists), replace = TRUE
)
# [[1]]
# [1] 15 13 18 6 15 15 12 8 9 12 7 17 14 12 15 10 18 19 24 24
#
# [[2]]
# [1] 12 2 15 16 14 17 11 11 10 12 6 19 13 12 13
#
# [[3]]
# [1] 10 9 16 12 13 11 10 18 14 19 16 16 12 19 4 15 19 19 13 14 15 10 14 12 10
# [26] 8 18 19 7 8 21 8 19 10 9
#
# [[4]]
# [1] 3 3 2 15 1 4 14 2
#
# [[5]]
# [1] 9 14 10 6 12 8
注意:我怀疑您对 MrFlick 评论的回答:“从 samp_distA 中选择所有值的可能性为 80%”。对我来说,走另一条路线要直观得多:“10 个值中的每一个都有 80% 的机会来自 samp_distA”……所以这就是我所做的。你确认你想要前者吗?
基本 R 等效:
set.seed(123)
mapply(
sample,
n.times,
lapply(probs, rep, each = nrow(dists)),
MoreArgs = list(x = unlist(dists, use.names = FALSE), replace = TRUE)
)
编辑
在评论中回复您的后续问题(“为每个人多次运行该函数,例如,作为输出,人 A 有 10 个随机采样值列表,每个长度为 20(对于 B、C、 D 和 E,也许每个人都有预定义的不同数量的列表)”):
n.reps <- c(A = 10, B = 1, C = 3, D = 2, E = 1)
set.seed(123)
pmap(
list(n.reps, n.times, map(probs, rep, each = nrow(dists))),
function(.x, .y, .z) replicate(
.x,
sample(flatten_dbl(dists), .y, replace = TRUE, .z),
simplify = FALSE
)
)
# $A
# $A[[1]]
# [1] 15 20 16 20 16 14 17 20 21 22 18 19 15 14 18 19 16 20 9 16
#
# $A[[2]]
# [1] 13 9 11 19 25 19 11 18 16 19 16 21 15 12 11 11 9 13 20 1
#
# $A[[3]]
# [1] 15 20 13 20 13 11 16 16 14 19 18 10 21 11 12 16 18 10 20 14
#
# $A[[4]]
# [1] 16 19 14 11 17 9 20 11 19 13 11 16 8 11 10 18 27 22 20 4
#
# $A[[5]]
# [1] 12 18 16 19 13 13 23 19 21 14 22 8 9 19 16 19 9 14 13 20
#
# $A[[6]]
# [1] 18 26 16 15 21 17 15 19 14 18 19 25 5 16 7 19 21 15 23 16
#
# $A[[7]]
# [1] 12 26 20 12 7 5 13 14 19 7 16 12 11 27 22 18 11 17 11 16
#
# $A[[8]]
# [1] 21 18 24 22 18 0 15 3 9 16 16 11 16 20 22 18 18 20 16 21
#
# $A[[9]]
# [1] 15 20 11 16 16 21 12 20 17 9 18 10 22 17 12 0 18 16 23 20
#
# $A[[10]]
# [1] 16 22 15 4 7 19 18 13 15 1 7 18 21 1 20 21 15 12 20 15
#
#
# $B
# $B[[1]]
# [1] 9 5 8 17 9 10 7 13 12 11 9 21 10 15 12
#
#
# $C
# $C[[1]]
# [1] 15 15 16 13 19 14 16 15 11 15 19 16 19 12 6 12 10 12 1 18 9 10 18 11 19
# [26] 9 6 19 18 12 9 18 14 12 7
#
# $C[[2]]
# [1] 5 14 16 10 8 13 8 18 22 18 14 12 13 10 19 12 15 10 16 13 16 9 15 6 15
# [26] 14 4 9 11 11 3 15 18 10 14
#
# $C[[3]]
# [1] 13 8 12 9 6 9 2 7 8 12 2 11 20 10 1 14 14 11 11 1 13 13 18 14 12
# [26] 21 11 3 7 7 13 13 11 7 14
#
#
# $D
# $D[[1]]
# [1] 11 1 1 7 12 6 0 8
#
# $D[[2]]
# [1] 4 1 7 15 2 2 8 9
#
#
# $E
# $E[[1]]
# [1] 7 8 6 11 10 6
关于r - 呼噜声;从具有概率列表的多列中抽样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47515159/