我有一个电子表格,其中包含 0 到 90000 之间的数字列表,这些数字以字符串形式存储在 R 列中。
由另一个系统分配的每个号码都应该是唯一的。大约 5% 以前使用过一次或多次。我无法控制其他系统。
每个月我都会向此列添加大约 50 个数字。我需要确定列表中是否已存在任何新数字(包括添加的新数字),并确定电子表格中包含第一个重复项和每个后续重复项的行。
最终,我需要识别(例如):第 51 行是第一个包含字符串“000356”的行,并且该行也出现在第 357 行和第 745 行中。
逐行进行搜索(在 VBA 中)非常耗时(我目前有超过 1000 行)。我需要对超过 3000 行的列进行类似的搜索。
我的研究表明,使用 VBA 字典是识别重复项的更快方法。
在下面的小测试程序中,我无法让它工作,更具体地说,我需要确定电子表格中的哪一行存在重复数字。
是否有更好的方法来实现这一点,以及如何修改下面的测试代码?
' From Module M2A to test faster search methods
' Needs "Microsoft Scripting Runtime" enabled
Dim shtCFYsheet As Worksheet
Dim oFound As Boolean
Dim junk, actName As String
Dim lastrowCFYsheet As Long
Dim dictA As New Scripting.dictionary
Dim keyA, keyB As Variant
Set shtCFYsheet = Worksheets("Communify Sheet")
lastrowCFYsheet = shtCFYsheet.Cells(Rows.Count, "A").End(xlUp).Row
' Load up DictA with all the entries from Column R
For i = 2 To lastrowCFYsheet 'Row 1 contains headings
dictA(Trim(shtCFYsheet.Cells(i, "R").Value)) = 1
Next i
For Each keyA In dictA.Keys
junk = DoEvents()
oFound = False 'reset the flag for the next KeyA entry
EntryA = keyA ' Capture the DictA entry
'Search for the first DictA entry throughout the DictA dictionary
For Each keyB In dictA.Keys
EntryB = keyB ' Capture the DictB entry
'Test for a match
If Trim(EntryA) = Trim(EntryB) Then
If oFound = True Then Debug.Print "Match:" & EntryA, EntryB, "A-row " & _
dictA.Item(keyA), "B-row " & dictA.Item(keyB)
'Ignore first match as that's my own entry
oFound = True 'Now set flag so that next entry gets flagged as a duplicate
End If
Next keyB
Next keyA
结束子
具有两个重复项的示例数据:
2456
4863
4190
2123
5610
9061
2640
679
4702
7428
38
3082
4702
8391
8781
998
2091
3729
5610
5051
1796
3355
169
1788
8838
最佳答案
代码:
Option Explicit
Sub dupeRs()
Dim i As Long, arr As Variant, tmp As Variant
Dim dict As New Scripting.Dictionary
With Worksheets("Communify Sheet")
'load worksheet values into array
arr = .Range(.Cells(1, "R"), .Cells(Rows.Count, "R").End(xlUp)).Value
End With
'build dictionary
For i = 2 To UBound(arr, 1)
If dict.exists(arr(i, 1)) Then
tmp = dict.Item(arr(i, 1))
ReDim Preserve tmp(LBound(tmp) To UBound(tmp) + 1)
tmp(UBound(tmp)) = i
dict.Item(arr(i, 1)) = tmp
Else
dict.Item(arr(i, 1)) = Array(i)
End If
Next i
'optionally remove all non-duplicates
For Each tmp In dict.Keys
If UBound(dict.Item(tmp)) = 0 Then dict.Remove tmp
Next tmp
'debug.print the duplicates and row numbers
For Each tmp In dict.Keys
Debug.Print tmp & " in rows " & Join(dict.Item(tmp), ", ")
Next tmp
End Sub
结果:
005610 in rows 6, 20
004702 in rows 10, 14
关于excel - 查找列中重复条目的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54978475/