sql-server - 合并 SQL Server 中的重复记录

标签 sql-server tsql

SQL Fiddle

我有下表

CREATE TABLE __EpiTest
(
    `ActivityRecordID` int, 
    `ActCstID` varchar(6), 
    `ResCstID` varchar(6), 
    `VolAmt` int, 
    `ActCnt` int, 
    `TotOCst` int, 
    `TotCst` int
);

INSERT INTO __EpiTest (`ActivityRecordID`, `ActCstID`, `ResCstID`, `VolAmt`, `ActCnt`, `TotOCst`, `TotCst`)
VALUES (15652, 'DIM008', 'CPF005', 30.455249786377, 1, 0, 0.375024198767061),
       (15652, 'DIM008', 'CSC004', 30.455249786377, 1, 7.62176510799961, 11.932578069479),
       (15652, 'DIM008', 'REC001', 30.455249786377, 1, 0.17902367836393, 0.384881520159455),
       (15652, 'OUT001', 'CPF002', 15, 0, 0, 16.9408193013078),
       (15652, 'OUT001', 'CSC001', 15, 0, 2.36971564207042,  2.36971564207042),
       (15652, 'OUT001', 'CSC004', 15, 0, 12.3230666021278, 12.3760690367354),
       (15652, 'OUT001', 'REC001', 15, 0, 0.377459387378349, 3.0275278374102),
       (15652, 'SUP001', 'CPF002', 1, 1, 0, 0.00108648359810756),
       (15652, 'SUP001', 'CPF011', 1, 1, 0, -1.89799880202357E-14),
       (15652, 'SUP001', 'CPF020', 1, 1, 0, 1.31058251625567E-05),
       (15652, 'SUP001', 'CPF021', 1, 1, 0, 25.0942308512551),
       (15652, 'SUP001', 'CPF021', 1, 1, 0, 25.0942308512551),
       (15652, 'SUP001', 'CSC001', 1, 1, 1.9628769103451, 1.9628769103451),
       (15652, 'SUP001', 'CSC001', 1, 1, 1.9628769103451, 1.9628769103451),
       (15652, 'SUP001', 'CSC002', 1, 1, 0, 10.2266625467779),
       (15652, 'SUP001', 'CSC004', 1, 1, 16.3451721608005, 16.3513319060046),
       (15652, 'SUP001', 'CSC004', 1, 1, 16.3451721608005, 16.3513319060046),
       (15652, 'SUP001', 'REC001', 1, 1, 0.254410386701976, -6.27048795659376),
       (15652, 'SUP001', 'REC001', 1, 1, 0.254410386701976, -6.27048795659376),
       (15652, 'SUP001', 'REC002', 1, 1, 0, 1.10781732547441);

请注意,有些行的值与 [ActivityRecordID] 匹配, [ActCstID][ResCstID] .我想合并这些值并对 [TotOCst] 中的值求和和 [TotCst] .为此,我尝试使用 MERGE
MERGE [__EpiTest] AS Tgt 
USING (
    SELECT [ActivityRecordID], 
           [ActCstID], 
           [ResCstID], 
           SUM([TotOCst]) AS TotOCst, 
           SUM([TotCst]) AS TotCst 
    FROM [__EpiTest]  
    GROUP BY [ActivityRecordID], 
             [ActCstID], 
             [ResCstID]) AS Src 
ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
     Tgt.[ActCstID] = Src.[ActCstID] AND 
     Tgt.[ResCstID] = Src.[ResCstID])  
WHEN MATCHED THEN 
    UPDATE 
    SET [TotOCst] = Src.[TotOCst], 
        [TotCst] = Src.[TotCst] 
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;
GO

这匹配并正确更新 [TotOCst] 中的值和 [TotCst]对于每个重复项,但它会在表中留下重复的行,而我希望除一个之外的所有行都删除。我怎样才能做到这一点?

请注意,目标表很大,因此我想尝试使用 MERGE 的变体,通过单个操作来完成此操作。上面的查询,或其他替代方法。多个查询对我来说太昂贵而无法处理......

插图

我得到
...
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255
15652   SUP001  CSC002  1   1   0                   10.2266625467779
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
...

但我想要
...
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255 
15652   SUP001  CSC002  1   1   0                   10.2266625467779
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
...

最佳答案

为了让它在 MERGE 语句中工作,您需要对行更具体一些。我已经调整了下面的查询:

MERGE [__EpiTest] AS Tgt 
USING (
    SELECT [ActivityRecordID], 
           [ActCstID], 
           [ResCstID], 
           SUM([TotOCst]) AS TotOCst, 
           SUM([TotCst]) AS TotCst ,
           VolAmt,
           ActCnt
    FROM [__EpiTest]  
    GROUP BY [ActivityRecordID], 
             [ActCstID], 
             [ResCstID], 
             VolAmt,
             ActCnt) AS Src 
ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
     Tgt.[ActCstID] = Src.[ActCstID] AND 
     Tgt.[ResCstID] = Src.[ResCstID] AND
     Tgt.TotOCst = Src.TotOCst AND
     Tgt.TotCst = Src.TotCst
     ) 
WHEN NOT MATCHED BY TARGET THEN
    INSERT ( ActivityRecordID, ActCstID, ResCstID, TotOCst, TotCst, VolAmt, ActCnt )
    VALUES ( ActivityRecordID, ActCstID, ResCstID, TotOCst, TotCst, VolAmt, ActCnt )
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;
GO

基本上,我已经对其进行了更改,以便合并源包含其他看起来不变的列,以便稍后插入它们。我已经更改了 MATCH条件基本上是“该行完全相同并且将保持不变”,因此删除了 WHEN MATCHED 语句。

然后我添加了一个 INSERT 语句,以便对于那些已更改的语句(它们的行将被 NOT MATCHED BY TARGET 删除),会插入一个带有这些值的新行。

在您的相同数据中,这将返回以下结果集:
ActivityRecordID    ActCstID    ResCstID    VolAmt  ActCnt  TotOCst TotCst
15652   DIM008  CPF005  30.455249786377 1   0   0.375024198767061
15652   DIM008  CSC004  30.455249786377 1   7.62176510799961    11.932578069479
15652   DIM008  REC001  30.455249786377 1   0.17902367836393    0.384881520159455
15652   OUT001  CPF002  15  0   0   16.9408193013078
15652   OUT001  CSC001  15  0   2.36971564207042    2.36971564207042
15652   OUT001  CSC004  15  0   12.3230666021278    12.3760690367354
15652   OUT001  REC001  15  0   0.377459387378349   3.0275278374102
15652   SUP001  CPF002  1   1   0   0.00108648359810756
15652   SUP001  CPF011  1   1   0   -1.89799880202357E-14
15652   SUP001  CPF020  1   1   0   1.31058251625567E-05
15652   SUP001  CSC002  1   1   0   10.2266625467779
15652   SUP001  REC002  1   1   0   1.10781732547441
15652   SUP001  CPF021  1   1   0   50.1884617025102
15652   SUP001  CSC001  1   1   3.9257538206902 3.9257538206902
15652   SUP001  CSC004  1   1   32.690344321601 32.7026638120092
15652   SUP001  REC001  1   1   0.508820773403952   -12.5409759131875

关于sql-server - 合并 SQL Server 中的重复记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46470773/

相关文章:

sql - 获取新的 SQL 记录 ID

c# - 如何在负载平衡的 Azure 虚拟机上使用 SessionState

sql从两个表中获取最大数量?

java - Sql server 2005知道插入了新记录

sql-server - Debezium 服务器到 PubSub : delete-operations breaks the application

sql-server - 如何从存储过程返回临时表

c# - 我如何使用 System.Data.Sql.SqlDataSourceEnumerator 类来了解可用的 sql 数据源...?

sql - 对树数据进行分组、聚合和求和的最佳方法是什么?

sql-server - 通过连接到整数列表在 SQL Server 中插入随机行数不一致

c# - SqlError.Number 说明