java - 从大小为 N 的数组生成一组 M 元素

更新:根据评论让我们做一些澄清。

我正在尝试了解以下任务的解决方案: 从大小为 N 的数组中随机生成一组 M 个元素。每个元素必须具有相等的被选择概率。

我找到了以下解决方案(我已经阅读 this question ，但它没有回答我的问题):

int rand(Random random, int min, int max) {
  return random.nextInt(1 + max - min) + min;
}

char[] generateArray(char[] original, int subsetSize) {
  char[] subset = new char[subsetSize];
  Random random = new Random();

  for (int i = 0; i < subsetSize; i++) {
    subset[i] = original[i];
  }
  for (int i = subsetSize; i < original.length; i++) {
    int r = rand(random,0, i);
    boolean takeIthElement = r < subsetSize;
    if (takeIthElement) {
      subset[r] = original[i];
    }
  }

  return subset;
}
// rand() function returns inclusive value 
// i.e. rand(0, 5) will return from 0 to 5

此代码可在“破解编码面试”一书中找到(困难部分，任务 3)。作者解释如下:

Suppose we have an algorithm that can pull a random set of m elements from an array of size n - 1. How can we use this algorithm to pull a random set of m elements from an array of size n? We can first pull a random set of size m from the first n - 1 elements. Then, we just need to decide if array[n] should be inserted into our subset (which would require pulling out a random element from it). An easy way to do this is to pick a random number k from 0 through n. If k < m, then insert array[n] into subset[k]. This will both "fairly" (i.e., with proportional probability) insert array[n] into the subset and "fairly" remove a random element from the subset. This is even cleaner to write iteratively. In this approach, we initialize an array subset to be the first m elements in original. Then, we iterate through the array, starting at element m, inserting array[i] into the subset at (random) position k whenever k < m.

我认为作者想说我们需要生成不是设置，而是数组。所以，我认为正确的任务描述应该是: 从大小为 N 的数组中随机生成一个包含 M 个元素的数组。每个元素必须具有相等的被选择概率。

如果属实，则上面的代码将无法正常工作。原因:

例如我们有一个数组 {'1', '2', 'a', 'b'}和m = 2
因此，我们应该有相同的概率生成以下集合:

{1, 2}; {2, 1}; {1, a}; {a, 1}; {1, b}; {b, 1}; {a, 2}; {2, a}; {b, 2}; {2, b}; {a, b}; {b, a}

我担心的是该函数永远不会生成以下集合: {2, 1}; {2, a}; {2, b}

所以，这意味着它是不正确的。

最佳答案

How can I prove it with math?

第二个 for 循环运行两次，首先 i 等于 2，然后 i 等于 3。

当i为2时，r变为0、1或2，每个概率为1/3。因此，字符 a 会被移动到索引 0 或 1 处的结果中，或者根本不移动，每个概率为 1/3。现在它是 [a, 2]、[1, a] 或 [1, 2]。

当i为3时，r为0、1、2或3。b以1/4的概率移动到索引0，以1/4的概率移动到索引1，并且以1/2的概率不移动到任何地方。

在下表中我给出了所有可能情况下的结果。 r 下的值 0、1 和 2 是第一次迭代中的可能值 (i = 2)。 r 右侧是第二次迭代中的可能值。

r    0       1       2       3
0  [b, 2]  [a, b]  [a, 2]  [a, 2]
1  [b, a]  [1, b]  [1, a]  [1, a]
2  [b, 2]  [1, b]  [1, 2]  [1, 2]

因此，在表中您可以看到，如果 r 两次都为 0，您的方法将返回 [b, 2] 等。

表中 12 个单元格的概率相等，即 1/12。让我们检查一下:[1, 2]、[1, a]、[1, b]、[a, 2] 和 [b, 2] 各有两次。 [a, b] 和 [b, a] 各出现一次，但它们是同一组，因此该组也出现两次。这涵盖了所有可能的子集，因此它们的可能性相同。

关于java - 从大小为 N 的数组生成一组 M 元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51173120/

java - 从大小为 N 的数组生成一组 M 元素

更新:根据评论让我们做一些澄清。

上一篇：java - 有没有办法在不破坏反序列化过程的情况下忽略 JsonProcessingException

下一篇：java - 简单的多边形内点测试，边缘行为一致