.net - Sort() 和 BinarySearch() 与整数/字符串的性能比较

标签 .net vb.net performance generics

最初我想问对整数进行排序是否比对字符串进行排序更快。 但我自己已经回答了这个问题,我对巨大的差异感到惊讶。 为什么排序和 BinarySearch 整数与字符串相比要快得多?

使用 1.000.000 Int32/Strings 的 (VB.Net) 测试:

Private Function CheckIntBinarySearch() As TimeSpan
    Dim watch As New System.Diagnostics.Stopwatch()
    Dim rnd As New Random(Date.Now.Millisecond)
    Dim intCol1 As New List(Of Int32)
    Dim intCol2 As New List(Of Int32)
    Dim contains As Int32
    For i As Int32 = 1 To 1000000
        intCol1.Add(rnd.Next(1, 1000000))
    Next
    For i As Int32 = 1 To 1000000
        intCol2.Add(rnd.Next(1, 1000000))
    Next
    Me.output.WriteLine("Integers sorting ...")
    watch.Start()
    intCol1.Sort()
    watch.Stop()
    Me.output.WriteLine("Sorting finished: " & watch.Elapsed.TotalSeconds & " seconds elapsed.")

    Me.output.WriteLine("Integers BinarySearch ...")
    watch.Start()
    For Each Val As Int32 In intCol2
        If intCol1.BinarySearch(Val) > -1 Then contains += 1
    Next
    watch.Stop()
    Me.output.WriteLine("BinarySearch finished(contains " & contains & "): " & watch.Elapsed.TotalSeconds & " seconds elapsed.")
    Return watch.Elapsed
End Function

Private Function CheckStringBinarySearch() As TimeSpan
    Dim watch As New System.Diagnostics.Stopwatch()
    Dim rnd As New Random(Date.Now.Millisecond)
    Dim stringCol1 As New List(Of String)
    Dim stringCol2 As New List(Of String)
    Dim contains As Int32
    For i As Int32 = 1 To 1000000
        stringCol1.Add(rnd.Next(1, 1000000).ToString)
    Next
    For i As Int32 = 1 To 1000000
        stringCol2.Add(rnd.Next(1, 1000000).ToString)
    Next
    Me.output.WriteLine("Strings sorting ...")
    watch.Start()
    stringCol1.Sort()
    watch.Stop()
    Me.output.WriteLine("Sorting finished: " & watch.Elapsed.TotalSeconds & " seconds elapsed.")
    Me.output.WriteLine("Strings BinarySearch ...")
    watch.Start()
    For Each Val As String In stringCol2
        If stringCol1.BinarySearch(Val) > -1 Then contains += 1
    Next
    watch.Stop()
    Me.output.WriteLine("BinarySearch finished(contains " & contains & "): " & watch.Elapsed.TotalSeconds & " seconds elapsed.")
    Return watch.Elapsed
End Function

检查性能 5 次:

For i As Int32 = 1 To 5
   intChecks.Add(CheckIntBinarySearch())
Next
For i As Int32 = 1 To 5
   stringChecks.Add(CheckStringBinarySearch())
Next

输出:

    1.)Integers sorting ...
    Sorting finished: 0,2292863 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 630857): 0,9365744 seconds elapsed.
    2.)Integers sorting ...
    Sorting finished: 0,2287382 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 632600): 0,9053134 seconds elapsed.
    3.)Integers sorting ...
    Sorting finished: 0,2318829 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 631475): 0,9038594 seconds elapsed.
    4.)Integers sorting ...
    Sorting finished: 0,2308994 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 632346): 0,9011047 seconds elapsed.
    5.)Integers sorting ...
    Sorting finished: 0,2266423 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 632982): 0,893541 seconds elapsed.
    1.)Strings sorting ...
    Sorting finished: 6,5661916 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 632579): 12,9545657 seconds elapsed.
    2.)Strings sorting ...
    Sorting finished: 6,5641975 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 631478): 13,0184132 seconds elapsed.
    3.)Strings sorting ...
    Sorting finished: 6,4281382 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 631775): 12,7684214 seconds elapsed.
    4.)Strings sorting ...
    Sorting finished: 6,9455087 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 631478): 13,7057234 seconds elapsed.
    5.)Strings sorting ...
    Sorting finished: 6,6707111 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 632346): 13,0493649 seconds elapsed.
  • Int32 平均排序:0,22948982
  • String 平均排序:6,63494942
  • Int32 BinarySearch 平均:0,90807858
  • String BinarySearch 平均:13,09929772

结论:

  • 排序整数比排序字符串快 29
  • BinarySearch Integers 比 BinarySearch Strings 快 14,4

为什么? 考虑拥有大量“字符串整数”(“1”,“2”,“3”,...)。在对它们进行排序和搜索之前将它们解析为整数会更好吗?将字符串解析为整数的成本是多少?好的,我想这是另一个问题。

最佳答案

字符串比较存在许多整数比较转义的问题。

  1. 考虑一下如何编写字符串比较。首先,您可以检查引用是否为空,然后您可以检查两个字符串的长度是否匹配,然后您可以一次比较每个数字。这至少是 2 次比较,其中整数只使用一次。
  2. 考虑比较“111112”和“111113”。对于字符串,在得到结果之前需要进行 6 次比较(每个字符一次)。对于整数,它只是一种比较。
  3. 可以优化整数比较和数组以使用特定于寄存器的指令,因此可能会直接在 CPU 缓存中进行大量排序/搜索 - 对于字符串,可能需要调用多个方法来访问字符、检查长度等。<
  4. 如果您需要特定于语言环境的比较,或者可以处理“ß”与“ss”或“æ”与“ae”相同的比较,您甚至不能使用长度优化。
  5. 字符串是引用类型,整数是值类型,因此可能存在取消引用。

此列表并不详尽,但至少可以了解问题。

顺便说一句,不影响性能 - 你应该知道比较字符串时“12”<“2”(这是按字典顺序发生的),所以上面的代码可能不是你想要的。

关于.net - Sort() 和 BinarySearch() 与整数/字符串的性能比较,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4183200/

相关文章:

c# - 用于解析完全限定类型名称的现有类/方法

.net - 自动浏览器测试 : How to test JavaScript in web pages?

.net - 在不移动鼠标的情况下将鼠标点击发送到特定位置

javascript - 访问 chrome/firefox 图像缓存

ios - 究竟什么代码必须放在 iOS 的主线程上?

c# - 禁用循环返回值的 C# 优化

asp.net - 如何将未经授权的用户重定向到登录页面?

vb.net - 映射所有以特定 URL 部分/关键字开头的路由

.net - 使用 ApplicationSettings 存储 WinForms RadioButtons 的 Checked 属性

r - 为什么我的 Rcpp 代码没有快多少?