excel - 查找列中重复条目的行

我有一个电子表格，其中包含 0 到 90000 之间的数字列表，这些数字以字符串形式存储在 R 列中。

由另一个系统分配的每个号码都应该是唯一的。大约 5% 以前使用过一次或多次。我无法控制其他系统。

每个月我都会向此列添加大约 50 个数字。我需要确定列表中是否已存在任何新数字(包括添加的新数字)，并确定电子表格中包含第一个重复项和每个后续重复项的行。

最终，我需要识别(例如):第 51 行是第一个包含字符串“000356”的行，并且该行也出现在第 357 行和第 745 行中。

逐行进行搜索(在 VBA 中)非常耗时(我目前有超过 1000 行)。我需要对超过 3000 行的列进行类似的搜索。

我的研究表明，使用 VBA 字典是识别重复项的更快方法。

在下面的小测试程序中，我无法让它工作，更具体地说，我需要确定电子表格中的哪一行存在重复数字。

是否有更好的方法来实现这一点，以及如何修改下面的测试代码？

'   From Module M2A to test faster search methods
'   Needs "Microsoft Scripting Runtime" enabled

Dim shtCFYsheet As Worksheet
Dim oFound As Boolean
Dim junk, actName As String
Dim lastrowCFYsheet As Long
Dim dictA As New Scripting.dictionary
Dim keyA, keyB As Variant

Set shtCFYsheet = Worksheets("Communify Sheet")
lastrowCFYsheet = shtCFYsheet.Cells(Rows.Count, "A").End(xlUp).Row

'   Load up DictA with all the entries from Column R

For i = 2 To lastrowCFYsheet 'Row 1 contains headings
    dictA(Trim(shtCFYsheet.Cells(i, "R").Value)) = 1
Next i

For Each keyA In dictA.Keys
    junk = DoEvents()
    oFound = False 'reset the flag for the next KeyA entry

    EntryA = keyA ' Capture the DictA entry

    'Search for the first DictA entry throughout the DictA dictionary
    For Each keyB In dictA.Keys
        EntryB = keyB ' Capture the DictB entry
        'Test for a match
        If Trim(EntryA) = Trim(EntryB) Then
            If oFound = True Then Debug.Print "Match:" & EntryA, EntryB, "A-row " & _
              dictA.Item(keyA), "B-row " & dictA.Item(keyB)
            'Ignore first match as that's my own entry
            oFound = True 'Now set flag so that next entry gets flagged as a duplicate
        End If
    Next keyB
Next keyA

结束子

具有两个重复项的示例数据:

最佳答案

代码:

Option Explicit

Sub dupeRs()

    Dim i As Long, arr As Variant, tmp As Variant
    Dim dict As New Scripting.Dictionary

    With Worksheets("Communify Sheet")

        'load worksheet values into array
        arr = .Range(.Cells(1, "R"), .Cells(Rows.Count, "R").End(xlUp)).Value

    End With

    'build dictionary
    For i = 2 To UBound(arr, 1)
        If dict.exists(arr(i, 1)) Then
            tmp = dict.Item(arr(i, 1))
            ReDim Preserve tmp(LBound(tmp) To UBound(tmp) + 1)
            tmp(UBound(tmp)) = i
            dict.Item(arr(i, 1)) = tmp
        Else
            dict.Item(arr(i, 1)) = Array(i)
        End If
    Next i

    'optionally remove all non-duplicates
    For Each tmp In dict.Keys
        If UBound(dict.Item(tmp)) = 0 Then dict.Remove tmp
    Next tmp

    'debug.print the duplicates and row numbers
    For Each tmp In dict.Keys
        Debug.Print tmp & " in rows " & Join(dict.Item(tmp), ", ")
    Next tmp

End Sub

结果:

005610 in rows 6, 20
004702 in rows 10, 14

关于excel - 查找列中重复条目的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54978475/

excel - 查找列中重复条目的行

上一篇：excel - 如何将具有可扩展行的 react 表导出到 csv 或 excel？

下一篇：r - 根据具有指定格式(例如，删除线)的单元格消除 R 中的 Excel 行？