algorithm - 对数组中最大数的索引进行采样，概率为 1/(最大数的数量)

这是我最近遇到的面试问题之一。程序返回数组中最大数的索引[注意:数组可能包含也可能不包含最大数的多个副本]，这样每个索引(包含最大数)的概率为 1/no of max numbers将被退回。

例子:

[-1 3 2 3 3]，每个位置[1,3,4]都有1/3的概率返回(三个3)
[ 2 4 6 6 3 1 6 6 ]，每个[2,3,6,7]都有1/4的概率返回(对应6的位置)。

首先，我给出了 O(n) 时间和 O(n) 空间算法，我在其中收集最大索引集，然后从该集中返回一个随机数。但是他要求一个 O(n) 时间和 O(1) 复杂度的程序，然后我想出了这个。

int find_maxIndex(vector<int> a)
{
     max = a[0];
     max_index = 0;
     count = 0;

     for(i = 1 to a.size())
     {
         if(max < a[i])
         {
             max = a[i];
             count = 0;
         }
         if(max == a[i])
         {
              count++;
              if(rand < 1/count) //rand = a random number in the range of [0,1]
                  max_index = i;
         }
      }
      return max_index;
}

我给了他这个解决方案。但我怀疑这个过程是否会以相等的概率选择最大数字的索引之一。希望我很清楚。还有其他方法可以做到这一点吗？

最佳答案

您拥有的是 Reservoir sampling !还有另一种易于理解的解决方案，但需要两次传递。

int find_maxIndex(vector<int> a){
    int count = 1;
    int maxElement = a[0];
    for(int i = 1; i < a.size(); i++){
        if(a[i] == maxElement){
            count ++;
        } else if(a[i] > maxElement){
            count = 1;
            maxElement = a[i];
        }
    }
    int occurrence = rand() % count + 1;
    int occur = 0;
    for(int i = 0; i < a.size(); i++){
        if(a[i] == maxElement){
            occur++;
            if(occur == occurrence) return i;
        }
    }
}

算法很简单，先求最大元素在第一遍中出现的次数。并选择一个随机事件并返回该事件的索引。虽然需要两次传递，但非常容易理解。

关于algorithm - 对数组中最大数的索引进行采样，概率为 1/(最大数的数量)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21717449/