vba - 使用 2 列删除重复项

标签 vba excel

我正在尝试删除工作表中的重复 ID。例如这里是几行数据

ID   |  Department |  Sales   | Update Date
1    | Sales       | 100      | 
2    | Marketing   | 100      | 
2    | Marketing   | 200      | 30/06/2015
2    | Marketing   | 300      | 05/07/2015

我想删除重复的 ID,但基于更新日期列。所以我只想剩下以下内容:
ID   |  Department |  Sales   | Update Date
1    | Sales       | 100      | 
2    | Marketing   | 300      | 05/07/2015

因此,它会检查该 ID 的最新更新行并删除其他行。

任何关于使用 VBA 或宏来执行此操作的建议都会很棒,因为它将构成自动化脚本的一部分。

最佳答案

实现您想要做的事情的一种方法是读取所有行并遍历每个重复行并根据找到最高 update_date 找到要保留的内容、要删除的内容。

我已经成功地编写了一个宏来做到这一点。这是我的代码:

第一 :在 VBA 编辑器中创建一个空白模块并粘贴以下代码:

Public Type Row

    id As String
    updated As Date

    row_number As Integer    'to know which rows to delete later
    is_duplicate As Boolean  'to mark if current row is duplicate
    to_keep As Boolean       'to decide whether to keep or to delete
    verified As Boolean      'needed to avoid evaluating all rows with the same ID

End Type



Sub RemoveDuplicates()

Range("a2").Select    'go to first row
Dim cnt As Integer    'keep record of how many rows
cnt = 0               'begin with an empty array
Dim rows() As Row     'declared without the count


'== step 1: read all data and store in array ===============
Do While ActiveCell.Value <> ""
    cnt = cnt + 1
    ReDim Preserve rows(cnt)   'expand the size of the array by ONE
    rows(cnt - 1).row_number = ActiveCell.Row   'keep record of current row address

    If ActiveCell.Offset(1, 0).Value = ActiveCell.Value Or _
       ActiveCell.Offset(-1, 0).Value = ActiveCell.Value Then

       'if the cell above/below has the samve ID as the current cell, then it's duplicates

        rows(cnt - 1).is_duplicate = True
    Else
        rows(cnt - 1).is_duplicate = False
    End If

    rows(cnt - 1).id = ActiveCell.Value                   'store the id
    rows(cnt - 1).updated = ActiveCell.Offset(0, 3).Value 'store the date
    ActiveCell.Offset(1, 0).Select                        'move to the next row below
Loop


'=== step 2: iterating throw the array and deciding what to keep, what to delete =========
For i = 0 To cnt - 1
    If rows(i).is_duplicate And Not rows(i).verified Then
        'the current ID is duplicated, and all of the other rows with the same ID has not been verified
        find_to_keep rows, rows(i).id, cnt   'helper Sub to analyze each row

    End If
Next


'==== step 3: iterating throw the array to delete ones marked to delete ==========

For i = cnt - 1 To 0 Step -1  'we have to reverse the order because deleted rows will contain data from other valid rows

    If rows(i).is_duplicate And Not rows(i).to_keep Then
        'if the current row is duplicate and is not marked (to keep) then it must be deleted

        Dim r As Integer
        r = rows(i).row_number   'get the rows number (address) of the row

        Range(r & ":" & r).EntireRow.Delete shift:=xlShiftUp   'delete the row and shift the other rows below UP

    End If

Next

End Sub

Sub find_to_keep(ByRef rows() As Row, ByVal id As String, ByVal cnt As Integer)
    Dim max_date As Date   'temparary variable to hold the maximum date

    Dim to_keep As Integer  'temporary variable to hold the location of row to keep


    ' -- step a: go throw the array and find all rows with id specified in the sub parameter
    For i = 0 To cnt - 1
        If rows(i).id = id Then
            'if that row has a date that is higher than our current max_date, the read its date
            If rows(i).updated > max_date Then
                max_date = rows(i).updated
                to_keep = i
            End If
        End If
    Next



    '-- step b: now that we know what row to keep, we need to do:
    '           1- mark all other rows having the same ID as verified (to avoid looping through them again)
    '           2- mark the row with the highest date to (to_keep) = true

    For i = 0 To cnt - 1
        If rows(i).id = id Then
            If i = to_keep Then
                rows(i).to_keep = True
            Else
                rows(i).to_keep = False
            End If
            rows(i).verified = True

        End If
    Next

End Sub

这是它的样子:screenshot of excel result

如果您愿意,我附上了整个工作簿供您引用:remove_Duplicates.xlsm

关于vba - 使用 2 列删除重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31133437/

相关文章:

mysql - 检查excel vba中是否存在DSN

vba - 从文本框中获取值并将其存储在 VBA 中的变量中

vba - MS-Access:数据更新后如何刷新表单?

excel - 将多张工作表中的同一行复制到excel中的一张工作表中

excel - 如何在所有现有 Excel 工作表之后添加工作表?

java - 使用Java更改Excel日期数据保存到数据库时的日期格式

vba - 在使用 VBA 填充列表的 Excel 中输入下拉列表时自动完成

Excel – 使用 VB 脚本生成 Word 文档

sql-server - 为什么 SSIS 导入 8500 条记录所需的时间比通过 SSMS 手动导入的时间要长得多?

java - Java中的Biff异常