algorithm - 水库采样了解概率

标签 algorithm random probability probability-theory reservoir-sampling

我无法理解水库采样所涉及的概率。下面是我看到的几乎无处不在的示例代码:

    1/*
    2  S has items to sample, R will contain the result, K number of items to select
    3*/
    4ReservoirSample(S[1..n], R[1..k])
    5  // fill the reservoir array
    6  for i = 1 to k
    7      R[i] := S[i]
    8
    9 // replace elements with gradually decreasing probability
    10  for i = k+1 to n
    11    j := random(1, i)   // important: inclusive range
    12    if j <= k
    13        R[j] := S[i]

我的理解对吗(?): 假设我们有 k=3 并且输入 = [100, 200, 300, 400, 500] 并且 i 当前处于 500 索引。 500 替换 300 在水库中的概率(大小为 3)= 300 在水库中被选中的概率 * 500 被选中的概率,只有当随机函数返回的索引小于或等于 3 时才有可能5 个选择 = 1/3 * 3/5 = 1/5

最佳答案

你的理解好像不对。 500 人被选中的概率与 300 人被选中的概率无关。

您应该查看 gfg link for the same. “这是如何运作的?”部分,其中说明如下:

Case 1: For last n-k stream items, i.e., for stream[i] where k <= i < n For every such stream item stream[i], we pick a random index from 0 to i and if the picked index is one of the first k indexes, we replace the element at picked index with stream[i]

To simplify the proof, let us first consider the last item. The probability that the last item is in final reservoir = The probability that one of the first k indexes is picked for last item = k/n (the probability of picking one of the k items from a list of size n)

Let us now consider the second last item. The probability that the second last item is in final reservoir[] = [Probability that one of the first k indexes is picked in iteration for stream[n-2]] X [Probability that the index picked in iteration for stream[n-1] is not same as index picked for stream[n-2] ] = [k/(n-1)]*[(n-1)/n] = k/n.

Similarly, we can consider other items for all stream items from stream[n-1] to stream[k] and generalize the proof.

Case 2: For first k stream items, i.e., for stream[i] where 0 <= i < k The first k items are initially copied to reservoir[] and may be removed later in iterations for stream[k] to stream[n]. The probability that an item from stream[0..k-1] is in final array = Probability that the item is not picked when items stream[k], stream[k+1], …. stream[n-1] are considered = [k/(k+1)] x [(k+1)/(k+2)] x [(k+2)/(k+3)] x … x [(n-1)/n] = k/n

关于algorithm - 水库采样了解概率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35326289/

相关文章:

c - 带缓存的递归斐波那契计算 - 找不到错误

javascript - 如何创建一个 for 循环递增一个数字,直到找到余数为 0 的那个?

java - 为什么这个循环排序实现有一个错误?

c++ - Windows 中的 Qt qrand in 循环生成相同的数字

c - 用1随机填充100X100矩阵的100格

algorithm - 找到总和为特定值的所有子集

jquery - 向多个元素添加类

python - 如何在Python中生成一个范围内但偏向于某些特定数字的随机数?

java - 6 次迭代后,在 for 循环中将新值赋给 double

python - Python中的概率分布