algorithm - 哈希查找和二分查找哪个更快?

标签 algorithm hash hashmap lookup binary-search

当给定一组静态对象(静态的意思是一旦加载它就很少改变),需要重复并发查找以获得最佳性能,这是更好的,HashMap 或使用一些自定义比较器进行二进制搜索的数组?

答案是对象还是结构类型的函数?哈希和/或相等函数性能?哈希唯一性?列表大小? Hashset 大小/集合大小?

我正在查看的集的大小可以在 500k 到 10m 之间的任何地方 - 以防该信息有用。

虽然我正在寻找 C# 答案,但我认为真正的数学答案不在于语言,所以我没有包含该标记。但是,如果需要注意 C# 特定事项,则需要该信息。

最佳答案

对于非常小的集合,差异可以忽略不计。在您范围的低端(50 万项),如果您进行大量查找,您将开始看到差异。二进制搜索将是 O(log n),而哈希查找将是 O(1),amortized。这与真正的常量不同,但您仍然必须有一个非常糟糕的哈希函数才能获得比二进制搜索更差的性能。

(当我说“可怕的散列”时,我的意思是:

hashCode()
{
    return 0;
}

是的,它本身速度非常快,但会使您的 HashMap 成为一个链表。)

ialiashkevich 使用数组和字典编写了一些 C# 代码来比较这两种方法,但它使用 Long 值作为键。我想测试在查找期间实际执行哈希函数的东西,所以我修改了该代码。我将其更改为使用字符串值,并将填充和查找部分重构为它们自己的方法,以便在探查器中更容易查看。我还保留了使用 Long 值的代码,作为比较点。最后,我摆脱了自定义二进制搜索功能,并使用了 Array 类中的那个。

这是代码:

class Program
{
    private const long capacity = 10_000_000;

    private static void Main(string[] args)
    {
        testLongValues();
        Console.WriteLine();
        testStringValues();

        Console.ReadLine();
    }

    private static void testStringValues()
    {
        Dictionary<String, String> dict = new Dictionary<String, String>();
        String[] arr = new String[capacity];
        Stopwatch stopwatch = new Stopwatch();

        Console.WriteLine("" + capacity + " String values...");

        stopwatch.Start();

        populateStringArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Populate String Array:      " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        populateStringDictionary(dict, arr);

        stopwatch.Stop();
        Console.WriteLine("Populate String Dictionary: " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        Array.Sort(arr);

        stopwatch.Stop();
        Console.WriteLine("Sort String Array:          " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchStringDictionary(dict, arr);

        stopwatch.Stop();
        Console.WriteLine("Search String Dictionary:   " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchStringArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Search String Array:        " + stopwatch.ElapsedMilliseconds);

    }

    /* Populate an array with random values. */
    private static void populateStringArray(String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            arr[i] = generateRandomString(20) + i; // concatenate i to guarantee uniqueness
        }
    }

    /* Populate a dictionary with values from an array. */
    private static void populateStringDictionary(Dictionary<String, String> dict, String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            dict.Add(arr[i], arr[i]);
        }
    }

    /* Search a Dictionary for each value in an array. */
    private static void searchStringDictionary(Dictionary<String, String> dict, String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            String value = dict[arr[i]];
        }
    }

    /* Do a binary search for each value in an array. */
    private static void searchStringArray(String[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            int index = Array.BinarySearch(arr, arr[i]);
        }
    }

    private static void testLongValues()
    {
        Dictionary<long, long> dict = new Dictionary<long, long>(Int16.MaxValue);
        long[] arr = new long[capacity];
        Stopwatch stopwatch = new Stopwatch();

        Console.WriteLine("" + capacity + " Long values...");

        stopwatch.Start();

        populateLongDictionary(dict);

        stopwatch.Stop();
        Console.WriteLine("Populate Long Dictionary: " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        populateLongArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Populate Long Array:      " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchLongDictionary(dict);

        stopwatch.Stop();
        Console.WriteLine("Search Long Dictionary:   " + stopwatch.ElapsedMilliseconds);

        stopwatch.Reset();
        stopwatch.Start();

        searchLongArray(arr);

        stopwatch.Stop();
        Console.WriteLine("Search Long Array:        " + stopwatch.ElapsedMilliseconds);
    }

    /* Populate an array with long values. */
    private static void populateLongArray(long[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            arr[i] = i;
        }
    }

    /* Populate a dictionary with long key/value pairs. */
    private static void populateLongDictionary(Dictionary<long, long> dict)
    {
        for (long i = 0; i < capacity; i++)
        {
            dict.Add(i, i);
        }
    }

    /* Search a Dictionary for each value in a range. */
    private static void searchLongDictionary(Dictionary<long, long> dict)
    {
        for (long i = 0; i < capacity; i++)
        {
            long value = dict[i];
        }
    }

    /* Do a binary search for each value in an array. */
    private static void searchLongArray(long[] arr)
    {
        for (long i = 0; i < capacity; i++)
        {
            int index = Array.BinarySearch(arr, arr[i]);
        }
    }

    /**
     * Generate a random string of a given length.
     * Implementation from https://stackoverflow.com/a/1344258/1288
     */
    private static String generateRandomString(int length)
    {
        var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
        var stringChars = new char[length];
        var random = new Random();

        for (int i = 0; i < stringChars.Length; i++)
        {
            stringChars[i] = chars[random.Next(chars.Length)];
        }

        return new String(stringChars);
    }
}

以下是几个不同大小的集合的结果。 (时间以毫秒为单位。)

500000 Long values...
Populate Long Dictionary: 26
Populate Long Array: 2
Search Long Dictionary: 9
Search Long Array: 80

500000 String values...
Populate String Array: 1237
Populate String Dictionary: 46
Sort String Array: 1755
Search String Dictionary: 27
Search String Array: 1569

1000000 Long values...
Populate Long Dictionary: 58
Populate Long Array: 5
Search Long Dictionary: 23
Search Long Array: 136

1000000 String values...
Populate String Array: 2070
Populate String Dictionary: 121
Sort String Array: 3579
Search String Dictionary: 58
Search String Array: 3267

3000000 Long values...
Populate Long Dictionary: 207
Populate Long Array: 14
Search Long Dictionary: 75
Search Long Array: 435

3000000 String values...
Populate String Array: 5553
Populate String Dictionary: 449
Sort String Array: 11695
Search String Dictionary: 194
Search String Array: 10594

10000000 Long values...
Populate Long Dictionary: 521
Populate Long Array: 47
Search Long Dictionary: 202
Search Long Array: 1181

10000000 String values...
Populate String Array: 18119
Populate String Dictionary: 1088
Sort String Array: 28174
Search String Dictionary: 747
Search String Array: 26503

为了比较,这里是程序最后一次运行的分析器输出(1000 万条记录和查找)。我强调了相关功能。他们非常同意上面的秒表计时指标。

Profiler output for 10 million records and lookups

您可以看到字典查找比二进制搜索快得多,并且(正如预期的那样)集合越大,差异越明显。因此,如果您有一个合理的散列函数(相当快且几乎没有冲突),散列查找应该优于二分查找这个范围内的集合。

关于algorithm - 哈希查找和二分查找哪个更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/360040/

相关文章:

javascript - Angular 中的分隔符

java - 使用 Java 的浏览按钮导入文件

java - 为 HashMap 创建自定义迭代器

java - 将 HashMap 转换为排序的 ArrayList

java - 以时间效率的方式将大数组中的所有零值移动到其前面部分

ruby - 如何重构这个简单的 Ruby 算法

python - 是否可以将 pandas 系列附加到列表中

algorithm - 将旧密码迁移到 CakePHP 3

java - 使用以我的对象作为键的 HashMap

algorithm - 我可以使用什么算法来生成简单的人类可读的容错字符串?