java - 这里如何使用油藏采样？

我正在解决 LeetCode 上的一个问题，该问题基于储层采样(RS)的概念。问题是:

Given an array (that may contain duplicates), randomly output the index of a given target number. The target number is guaranteed to exist in the array. For e.g., if the input is: {1,2,3,3,3} and the target is 3, then any of the indices 2, 3 and 4 must be returned with equal probability. Similarly, if the target is 1, then the index 0 should be returned (obviously with probability of 1).

获得最多支持的代码如下:

public class Solution {

    int[] nums;
    Random rnd;

    public Solution(int[] nums) {
        this.nums = nums;
        this.rnd = new Random();
    }

    public int pick(int target) {
        int result = -1;
        int count = 0;
        for (int i = 0; i < nums.length; i++) {
            if (nums[i] != target)
                continue;
            if (rnd.nextInt(++count) == 0)
                result = i;
        }

        return result;
    }
}

据我了解 RS，我们首先选择 n 个数字。然后，当我们选择下一个数字时，我们需要以1/(n+1)的概率选择它。如何做到:

rnd.nextInt(++count) == 0

选择一个概率为1/(n+1)的数字？我知道我们执行 ++count 使其变为 n+1，但是分子发生了什么？为什么是 rnd.nextInt() 而不是简单的 1？

编辑:问题链接 here .

最佳答案

好的，我们有这个语句 rnd.nextInt(++count) == 0

第一次命中目标时，count 增加到 1，语句的计算结果为

rnd.nextInt(1) == 0

它总是返回true，因为参数1是一个独占上限，并且它可能返回的唯一可能值是0。此时，当前索引被设置为结果，因为我们不知道数组中是否还有其他目标。

第二次遇到目标时，count 变为 2，语句的计算结果为

rnd.nextInt(2) == 0

这个返回 0 或 1。因此，该索引有 50% 的机会被选为结果。

因此数组中有两个已知目标，并且都有 50% 的机会被设置为结果。第二个目标有这个机会是显而易见的，但对于第一个目标我们可以做一个小计算。第一次代码命中 if 语句时，它有 100% 的机会被选中，第二次有 50% 的机会第二个目标不成为结果(留下 50第一个目标保留为结果的机会百分比)。这些机会综合起来导致

first target as result = 100% x 50% = 50%

可以对 2 个以上的目标重复这些步骤，但这会使解释变得更加复杂。最后，数组中的所有 N 个目标都有 1/N 机会成为结果。

关于java - 这里如何使用油藏采样？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45761582/

java - 这里如何使用油藏采样？

上一篇：java - JAXB-2 Maven 插件、模式包含和绑定(bind)

下一篇：java - 如果我将浏览器 cookie 复制到另一台电脑上，会发生什么情况？