arrays - 具有未知数量项目的二进制搜索

标签 arrays algorithm binary-search

假设您不知道要搜索的元素的数量,并且给定了一个 API,该 API 接受一个索引,如果您在边界之外将返回 null(如此处使用 getWordFromDictionary 方法实现的),您如何执行二进制搜索并实现客户端程序的 isWordInDictionary() 方法?

此解决方案有效,但我最终在找到初始高索引值的级别之上进行了串行搜索。通过较低的值范围进行搜索的灵感来自 this answer .我还查看了 Reflector(C# 反编译器)中的 BinarySearch,但它的列表长度已知,因此仍在寻找填补空白的方法。

private static string[] dictionary;

static void Main(string[] args)
{
    dictionary = System.IO.File.ReadAllLines(@"C:\tmp\dictionary.txt");

    Console.WriteLine(isWordInDictionary("aardvark", 0));
    Console.WriteLine(isWordInDictionary("bee", 0));
    Console.WriteLine(isWordInDictionary("zebra", 0));
    Console.WriteLine(isWordInDictionaryBinary("aardvark"));
    Console.WriteLine(isWordInDictionaryBinary("bee"));
    Console.WriteLine(isWordInDictionaryBinary("zebra"));
    Console.ReadLine();
}

static bool isWordInDictionaryBinary(string word)
{
    // assume the size of the dictionary is unknown

    // quick check for empty dictionary
    string w = getWordFromDictionary(0);
    if (w == null)
        return false;

    // assume that the length is very big.
    int low = 0;
    int hi = int.MaxValue;

    while (low <= hi)
    {
        int mid = (low + ((hi - low) >> 1));
        w = getWordFromDictionary(mid);

        // If the middle element m you select at each step is outside 
        // the array bounds (you need a way to tell this), then limit
        // the search to those elements with indexes small than m.
        if (w == null)
        {
            hi = mid;
            continue;
        }

        int compare = String.Compare(w, word);
        if (compare == 0)
            return true;

        if (compare < 0)
            low = mid + 1;
        else
            hi = mid - 1;
    }

    // punting on the search above the current value of hi 
    // to the (still unknown) upper limit
    return isWordInDictionary(word, hi);
}


// serial search, works good for small number of items
static bool isWordInDictionary(string word, int startIndex) 
{
    // assume the size of the dictionary is unknown
    int i = startIndex;
    while (getWordFromDictionary(i) != null)
    {
        if (getWordFromDictionary(i).Equals(word, StringComparison.OrdinalIgnoreCase))
            return true;
        i++;
    }

    return false;
}

private static string getWordFromDictionary(int index)
{
    try
    {
        return dictionary[index];
    }
    catch (IndexOutOfRangeException)
    {
        return null;
    }
}

答案后的最终代码

static bool isWordInDictionaryBinary(string word)
{
    // assume the size of the dictionary is unknown

    // quick check for empty dictionary
    string w = getWordFromDictionary(0);
    if (w == null)
        return false;

    // assume that the number of elements is very big
    int low = 0;
    int hi = int.MaxValue;

    while (low <= hi)
    {
        int mid = (low + ((hi - low) >> 1));
        w = getWordFromDictionary(mid);

        // treat null the same as finding a string that comes 
        // after the string you are looking for
        if (w == null)
        {
            hi = mid - 1;
            continue;
        }

        int compare = String.Compare(w, word);
        if (compare == 0)
            return true;

        if (compare < 0)
            low = mid + 1;
        else
            hi = mid - 1;
    }

    return false;
}

最佳答案

您可以分两个阶段实现二分查找。在第一阶段,您会增加要搜索的区间的大小。一旦您检测到超出范围,就可以在您找到的最新区间内进行正常的二分搜索。像这样:

bool isPresentPhase1(string word)
{
  int l = 0, d = 1;
  while( true ) // you should eventually reach an index out of bounds
  {
    w = getWord(l + d);
    if( w == null )
      return isPresentPhase2(word, l, l + d - 1);
    int c = String.Compare(w, word);
    if( c == 0 )
      return true;
    else if( c < 0 )
      isPresentPhase2(value, l, l + d - 1);
    else
    {
      l = d + 1;
      d *= 2;
    } 
  }
}

bool isPresentPhase2(string word, int lo, int hi)
{
    // normal binary search in the interval [lo, hi]
}

关于arrays - 具有未知数量项目的二进制搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5590921/

相关文章:

c - 使用第一个元素作为枢轴的 C 中的快速排序实现

arrays - 有序列表的二进制搜索(插入)的复杂性

objective-c - 在 Objective-C 中将 NSArray 转换为 NSString

java - 如何为 pacman 创建路径追踪算法?

c++ - 尝试使用 const int std::array 时出错

c++ - STL 哈希函数

python - 使用后缀数组查找子字符串的一次出现

python - 在列表中执行二分搜索 - Python

python - 如何有效地切片 numpy 数组

php - 序列化与内爆