这对人眼来说是一项简单的任务,但如果我手动完成,我需要 2 周的时间才能完成。
我正在使用 UDF我从该站点获得的用于将 TextJoin 功能添加到我的 Excel 2011 中。效果很好,但是我在使用的新电子表格中遇到了问题,该电子表格有近 50,000 行。问题是我不能简单地删除重复项,因为在某些情况下数据重复,我需要捕获它。但是,我找到了另一列,其中包含与重复数据相关的唯一数据。通过查看这些第二个数据列,我可以很容易地分辨出什么是重复的以及需要包含什么。
我创建了一个简化的电子表格示例,因为它更容易用文字解释。
源数据:
Acct# Lname,Fname Date Data#1 Data#2
42 Doe, John 1/1/17 10001 1001
42 Doe, John 1/1/17 10001 1001
42 Doe, John 1/1/17 30003 1001
42 Doe, John 1/1/17 10001 1002
42 Doe, John 1/1/17 10001 1002
42 Doe, John 1/1/17 30003 1002
70 Smith, Jane 2/1/17 10001 2001
70 Smith, Jane 2/1/17 20002 2001
70 Smith, Jane 2/1/17 30003 2001
70 Smith, Jane 2/1/17 10001 2002
70 Smith, Jane 2/1/17 20002 2002
70 Smith, Jane 2/1/17 30003 2002
70 Smith, Jane 2/1/17 10001 2003
70 Smith, Jane 2/1/17 20002 2003
70 Smith, Jane 2/1/17 30003 2003
93 Blow, Joe 1/1/17 10001 3001
93 Blow, Joe 1/1/17 20002 3001
93 Blow, Joe 1/1/17 30003 3001
93 Blow, Joe 1/1/17 10001 3002
93 Blow, Joe 1/1/17 20002 3002
93 Blow, Joe 1/1/17 30003 3002
177 Bryant, Kobe 2/1/17 10001 4001
177 Bryant, Kobe 2/1/17 30003 4001
177 Bryant, Kobe 2/1/17 30003 4001
177 Bryant, Kobe 2/1/17 10001 4002
177 Bryant, Kobe 2/1/17 30003 4002
177 Bryant, Kobe 2/1/17 30003 4002
177 Bryant, Kobe 2/1/17 10001 4003
177 Bryant, Kobe 2/1/17 30003 4003
177 Bryant, Kobe 2/1/17 30003 4003
输出数据:
Acct# (Lname, Fname) Date Data#1 Data#2
42 Doe, John 1/1/17 10001, 10001, 30003, 10001, 10001, 30003
70 Smith, Jane 2/1/17 10001, 20002, 30003, 10001, 20002, 30003, 10001, 20002, 30003
93 Blow ,Joe 1/1/17 10001, 20002, 30003, 10001, 20002, 30003
177 Bryant, Kobe 2/1/17 10001, 30003, 30003, 10001, 30003, 30003, 10001, 30003, 30003
我不知道如何将 excel 电子表格放在我的问题中,所以这是我目前拥有的屏幕截图以及所需的输出(我现在只关注 Data#1 列)。
根据要求,这是我用来提取 Data#1 的 VBA 代码:
Function TEXTJOIN(delim As String, skipblank As Boolean, arr)
Dim d As Long
Dim c As Long
Dim arr2()
Dim t As Long, y As Long
t = -1
y = -1
If TypeName(arr) = "Range" Then
arr2 = arr.Value
Else
arr2 = arr
End If
On Error Resume Next
t = UBound(arr2, 2)
y = UBound(arr2, 1)
On Error GoTo 0
If t >= 0 And y >= 0 Then
For c = LBound(arr2, 1) To UBound(arr2, 1)
For d = LBound(arr2, 1) To UBound(arr2, 2)
If arr2(c, d) <> "" Or Not skipblank Then
TEXTJOIN = TEXTJOIN & arr2(c, d) & delim
End If
Next d
Next c
Else
For c = LBound(arr2) To UBound(arr2)
If arr2(c) <> "" Or Not skipblank Then
TEXTJOIN = TEXTJOIN & arr2(c) & delim
End If
Next c
End If
TEXTJOIN = Left(TEXTJOIN, Len(TEXTJOIN) - Len(delim))
End Function
我尝试了建议的子例程,得到了以下输出:
最佳答案
这是一些至少适用于您的示例数据的代码,尽管我怀疑一旦数据发生一点变化,它的工作效果就会严重下降。
Option Explicit
Sub PatternFilter()
Dim ws As Worksheet
Dim index1_col As String
Dim index2_col As String
Dim data1_col As String
Dim data2_col As String
Dim lastrow As Long
Dim lastentryrow As Long
Dim outputline As Long
Dim iter1 As Long
Dim iter2 As Long
Dim datastring As String
Set ws = ThisWorkbook.Sheets("Sheet1")
index1_col = "A" ' the column with the unique identifier, such as name or acct#
index2_col = "C" ' another column which can either be unique
data1_col = "D" ' data1 column
data2_col = "E" ' data2 column
lastrow = ws.Range(index1_col & ws.Rows.Count).End(xlUp).Row ' so we know where to stop
outputline = 2 ' just an incrementor to keep putting data on unique lines
For iter1 = 2 To lastrow Step 1
datastring = "" ' reset our output string
For iter2 = iter1 + 1 To lastrow Step 1 ' a for loop to find the last row in each pattern set
If (ws.Range(index1_col & iter2).Value <> ws.Range(index1_col & iter2 + 1).Value) _
Or (ws.Range(index2_col & iter2).Value <> ws.Range(index2_col & iter2 + 1).Value) Then
lastentryrow = iter2
Exit For
End If
Next
For iter2 = iter1 To lastentryrow Step 1 ' a for loop to collect all of the data1 pattern
If ws.Range(data2_col & iter2).Value <> ws.Range(data2_col & iter2 + 1).Value Then
datastring = datastring & "," & ws.Range(data1_col & iter2).Value
Exit For
End If
If datastring = "" Then
datastring = ws.Range(data1_col & iter2).Value
Else
datastring = datastring & "," & ws.Range(data1_col & iter2).Value
End If
Next
ws.Range("I" & outputline).Value = datastring ' save the data1 pattern
datastring = "" ' reset the output string
For iter2 = iter1 To lastentryrow Step 1 ' a for loop to collect all of the data2 pattern
If ws.Range(data2_col & iter2).Value <> ws.Range(data2_col & iter2 - 1).Value Then
If datastring = "" Then
datastring = ws.Range(data2_col & iter2).Value
Else
datastring = datastring & "," & ws.Range(data2_col & iter2).Value
End If
End If
Next
ws.Range("J" & outputline).Value = datastring ' save the data2 pattern
ws.Range("G" & outputline).Value = ws.Range(index1_col & iter1).Value ' put the unique identifier with the data so we know who the data belongs to
ws.Range("H" & outputline).Value = ws.Range(index2_col & iter1).Value
outputline = outputline + 1 ' increment the output line to avoid overwriting existing data
iter1 = lastentryrow ' set this to last entry line (for loop will increment to the first line of the next entry for us)
Next
End Sub
当然,这可以被操纵以输出到任何你想要的地方,包括一个新的工作表等。如果有什么你不明白或者对你不起作用,请告诉我
*编辑:每个 OP 请求添加了第二个索引列
关于excel - Excel 是否可以识别单元格中以逗号分隔的数字模式并删除该模式的重复项?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45578077/