sql - 优化 SQL 查询以移除游标

标签 sql sql-server-2005 optimization

我正在尝试编写一个查询,该查询将遍历一个表并将帐户上的任何信用应用于最早的余额。我想不出不使用游标的方法,我知道应该不惜一切代价避免使用游标,所以我来这里寻求帮助。

select * into #balances from [IDAT_AR_BALANCES] where amount > 0
select * into #credits from [IDAT_AR_BALANCES] where amount < 0

create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)

declare credit_cursor cursor for
select [CLIENT_ID], amount, cvtGUID from #credits

open credit_cursor
declare @client_id varchar(11)
declare @credit money
declare @balance money
declare @cvtGuidBalance uniqueidentifier
declare @cvtGuidCredit uniqueidentifier
fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
while @@fetch_status = 0
begin
      while(@credit < 0 and (select count(*) from #balances where @client_id = CLIENT_ID and amount <> 0) > 0)
      begin
            select top 1  @balance = amount, @cvtGuidBalance = cvtGuid from #balances where @client_id = CLIENT_ID and amount <> 0 order by AGING_DATE
            set @credit = @balance + @credit
            if(@credit > 0)
            begin
                  update #balances set amount = @credit where cvtGuid = @cvtGuidBalance
                  set @credit = 0
            end
            else
            begin
                  update #balances set amount = 0 where cvtGuid = @cvtGuidBalance
            end
      end
      update #credits set amount = @credit where cvtGuid = @cvtGuidCredit
      fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
end

close credit_cursor
deallocate credit_cursor

delete #balances where AMOUNT = 0
delete #credits where AMOUNT = 0

truncate table [IDAT_AR_BALANCES]

insert [IDAT_AR_BALANCES] select * from #balances
insert [IDAT_AR_BALANCES] select * from #credits

drop table #balances
drop table #credits

在我的 10000 条记录和 1000 个客户端的测试用例中,运行需要 26 秒,通过在 CLIENT_ID 上添加两个索引,我能够将数字降低到 14 秒。然而,这对于我需要的来说仍然太慢,最终结果可能有多达 10000 个客户端和超过 4,000,000 条记录,因此运行时间很容易变成两位数的分钟。

任何关于我如何重构它以删除光标的建议将不胜感激。

示例( 更新以显示运行后您可以拥有多个积分 ):
before
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              20.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              -10.00      1/3/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       2              20.00       1/1/2011
xxxxxx       2              15.00       1/2/2011
xxxxxx       2              -40.00      1/3/2011
xxxxxx       2              5.00        1/4/2011
xxxxxx       3              10.00       1/1/2011
xxxxxx       3              -20.00      1/2/2011
xxxxxx       3              5.00        1/3/2011
xxxxxx       3              -8.00       1/4/2011

after
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              10.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       3              -5.00       1/2/2011
xxxxxx       3              -8.00       1/4/2011

所以它会将负信用应用于最早的正余额(示例中的客户 1),如果在完成后没有剩余的正余额,它会留下剩余的负余额(客户 3),如果它们完全抵消(这是90% 的情况下使用真实数据)它将完全删除记录(客户端 2)。

最佳答案

可以在递归 CTE 的帮助下解决这个问题。

基本思想是这样的:

  • 分别获取每个帐户的正值和负值的总数 ( client_id )。
  • 迭代每个帐户并“削减”两个总数之一的数量,具体取决于 amount的符号和绝对值(即永远不会“夹断”相应的总数超过其当前值)。应该在 amount 中添加/减去相同的值.
  • 更新后,删除 amount 所在的行已变为 0。

  • 对于我的解决方案,我借用了 Lieven 的表变量定义(谢谢!),添加一列( cvtGuid ,为了演示目的声明为 int)和一行(原始示例中的最后一行,其中Lieven 的脚本中缺少)。
    /* preparing the demonstration data */
    DECLARE @IDAT_AR_BALANCES TABLE (
      cvtGuid int IDENTITY,
      client_id INTEGER
      , amount FLOAT
      , date DATE
    );
    INSERT INTO @IDAT_AR_BALANCES
      SELECT 1, 20.00, '1/1/2011'
      UNION ALL SELECT 1, 30.00, '1/2/2011'
      UNION ALL SELECT 1, -10.00, '1/3/2011'
      UNION ALL SELECT 1, 5.00, '1/4/2011'
      UNION ALL SELECT 2, 20.00, '1/1/2011'
      UNION ALL SELECT 2, 15.00, '1/2/2011'
      UNION ALL SELECT 2, -40.00, '1/3/2011'
      UNION ALL SELECT 2, 5.00, '1/4/2011'
      UNION ALL SELECT 3, 10.00, '1/1/2011'
      UNION ALL SELECT 3, -20.00, '1/2/2011'
      UNION ALL SELECT 3, 5.00, '1/3/2011'
      UNION ALL SELECT 3, -8.00, '1/4/2011';
    
    /* checking the original contents */
    SELECT * FROM @IDAT_AR_BALANCES;
    
    /* getting on with the job: */
    WITH totals AS (
      SELECT
        /* 1) preparing the totals */
        client_id,
        total_pos = SUM(CASE WHEN amount > 0 THEN amount END),
        total_neg = SUM(CASE WHEN amount < 0 THEN amount END)
      FROM @IDAT_AR_BALANCES
      GROUP BY client_id
    ),
    refined AS (
      /* 2) refining the original data with auxiliary columns:
         * rownum - row numbers (unique within accounts);
         * amount_to_discard_pos - the amount to discard `amount` completely if it's negative;
         * amount_to_discard_neg - the amount to discard `amount` completely if it's positive
      */
      SELECT
        *,
        rownum = ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY date),
        amount_to_discard_pos = CAST(CASE WHEN amount < 0 THEN -amount ELSE 0 END AS float),
        amount_to_discard_neg = CAST(CASE WHEN amount > 0 THEN -amount ELSE 0 END AS float)
      FROM @IDAT_AR_BALANCES
    ),
    prepared AS (
      /* 3) preparing the final table (using a recursive CTE) */
      SELECT
        cvtGuid = CAST(NULL AS int),
        client_id,
        amount = CAST(NULL AS float),
        date = CAST(NULL AS date),
        amount_update = CAST(NULL AS float),
        running_balance_pos = total_pos,
        running_balance_neg = total_neg,
        rownum = CAST(0 AS bigint)
      FROM totals
      UNION ALL
      SELECT
        n.cvtGuid,
        n.client_id,
        n.amount,
        n.date,
        amount_update = CAST(
          CASE
            WHEN n.amount_to_discard_pos < p.running_balance_pos
            THEN n.amount_to_discard_pos
            ELSE p.running_balance_pos
          END
          +
          CASE
            WHEN n.amount_to_discard_neg > p.running_balance_neg
            THEN n.amount_to_discard_neg
            ELSE p.running_balance_neg
          END
        AS float),
        running_balance_pos = CAST(p.running_balance_pos -
          CASE
            WHEN n.amount_to_discard_pos < p.running_balance_pos
            THEN n.amount_to_discard_pos
            ELSE p.running_balance_pos
          END
        AS float),
        running_balance_neg = CAST(p.running_balance_neg -
          CASE
            WHEN n.amount_to_discard_neg > p.running_balance_neg
            THEN n.amount_to_discard_neg
            ELSE p.running_balance_neg
          END
        AS float),
        n.rownum
      FROM refined n
        INNER JOIN prepared p ON n.client_id = p.client_id AND n.rownum = p.rownum + 1
    )
    /*                  -- some junk that I've forgotten to clean up,
    SELECT *            -- which you might actually want to use
    FROM prepared       -- to view the final prepared result set
    WHERE rownum > 0    -- before actually running the update
    ORDER BY client_id, rownum
    */
    /* performing the update */
    UPDATE t
    SET amount = t.amount + u.amount_update
    FROM @IDAT_AR_BALANCES t INNER JOIN prepared u ON t.cvtGuid = u.cvtGuid
    OPTION (MAXRECURSION 0);
    
    /* checking the contents after UPDATE */
    SELECT * FROM @IDAT_AR_BALANCES;
    
    /* deleting the eliminated amounts */
    DELETE FROM @IDAT_AR_BALANCES WHERE amount = 0;
    
    /* checking the contents after DELETE */
    SELECT * FROM @IDAT_AR_BALANCES;
    

    更新

    正如 Lieven 正确建议的那样(再次感谢您!),您可以从 amount 的帐户中删除所有行。先加起来为 0,然后更新其他行。这将提高整体性能,因为正如您所说,大多数数据的数量加起来为 0。

    以下是 Lieven 删除“零帐户”解决方案的变体:
    DELETE FROM @IDAT_AR_BALANCES
    WHERE client_id IN (
      SELECT client_id
      FROM @IDAT_AR_BALANCES
      GROUP BY client_id
      HAVING SUM(amount) = 0
    )
    

    但请记住,DELETE更新之后还是需要的,因为更新可能会重置一些amount值为 0。如果我是你,我可能会考虑创建一个 FOR UPDATE 触发器,它会自动删除 amount = 0 所在的行。 .这样的解决方案并不总是可以接受的,但有时很好。这取决于您还可以对数据做什么。这也可能取决于它是否仅仅是您的项目,或者还有其他维护者(他们不喜欢“神奇地”和意外消失的行)。

    关于sql - 优化 SQL 查询以移除游标,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5809266/

    相关文章:

    mysql - 从另一个查询返回的记录的列求和

    mysql - 帮助改进此 SQL UNION 查询

    mysql - SQL 查询以显示与另一个表的列中的条件完全匹配的数据

    sql-server-2005 - 语句 "USE @dbname"不起作用,为什么?怎么做?

    c++ - 配置文件引导优化后嵌套 for 循环更快,但缓存未命中率更高

    java - 优化PIG中的过滤功能

    c++ - move 或命名返回值优化 (NRVO)?

    mysql - 在此更新中指定哪个表?

    sql - 为什么存储过程不能与 select、where 和 having 一起使用

    sql-server - TSQL:从一组具有最低正值的列中选择