我了解该算法的工作原理。但我不明白为什么它是正确的。假设我们只需要选择一个元素。这是我找到的证据
at every step N, keep the next element in the stream with probability 1/N. This means that we have an (N-1)/N probability of keeping the element we are currently holding on to, which means that we keep it with probability (1/(N-1)) * (N-1)/N = 1/N.
除了最后一部分,我都明白了。为什么我们要乘以概率?
最佳答案
因为Pr[A AND B] == Pr[A] * Pr[B]
,假设A
和B
是独立的(因为他们在这里)。选择该元素并且以后不替换它的概率是这两种可能性概率的乘积。
关于algorithm - 水库采样: why is it selected uniformly at random,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29131567/