sql - 选择超过总值百分比的行子集

标签 sql sql-server sql-server-2008 tsql cumulative-sum

我有一个类似于以下的客户、用户和收入表(实际上有数千条记录):

Customer   User    Revenue
001        James   500
002        James   750
003        James   450
004        Sarah   100
005        Sarah   500
006        Sarah   150
007        Sarah   600
008        James   150
009        James   100

我想要做的是只返回支出最高的客户,这些客户占用户总收入的 80%。

要手动执行此操作,我将按收入对 James 的客户进行排序,计算总百分比和运行总百分比,然后仅返回运行总百分比达到 80% 的记录:
Customer    User    Revenue     % of total  Running Total %
002         James   750         0.38        0.38 
001         James   500         0.26        0.64 
003         James   450         0.23        0.87  <- Greater than 80%, last record
008         James   150         0.08        0.95 
009         James   100         0.05        1.00 

我试过使用 CTE,但到目前为止都是空白的。有没有办法通过单个查询而不是在 Excel 工作表中手动执行此操作?

最佳答案

SQL Server 2012+只有

您可以使用窗口 SUM :

WITH cte AS
(
   SELECT *,
          1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile,
          1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC)
                /SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile
   FROM tab
)
SELECT *
FROM cte 
WHERE running_percentile <= 0.8;

LiveDemo

SQL Server 2008:
WITH cte AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
    FROM t    
), cte2 AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM cte c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]
            AND c2.rn <= c.rn) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT *
FROM cte2
WHERE running_percentile <= 0.8;

LiveDemo2

输出:
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝

编辑 2:

That looks nearly there, the only niggle is it's missing the last row, the third row for James takes him over 0.80 but needs to be included.


WITH cte AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
    FROM t    
), cte2 AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM cte c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]
            AND c2.rn <= c.rn) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte2 a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte2
             WHERE running_percentile >= 0.8
               AND cte2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp;

LiveDemo3

输出:
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        3 ║ James ║     450 ║ 0,230769230769 ║ 0,871794871794     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
║        5 ║ Sarah ║     500 ║ 0,370370370370 ║ 0,814814814814     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝

Looks to be perfect, translated to my big table and returns what I need, spent a good 5 minutes working through it and still can't follow what you've done!


SQL Server 2008不支持 OVER() 中的所有内容条款,但 ROW_NUMBER做。

首先 cte 只是计算组内的位置:
╔═══════════╦════════╦══════════╦════╗
║ Customer  ║ User   ║ Revenue  ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║        2  ║ James  ║     750  ║  1 ║
║        1  ║ James  ║     500  ║  2 ║
║        3  ║ James  ║     450  ║  3 ║
║        8  ║ James  ║     150  ║  4 ║
║        9  ║ James  ║     100  ║  5 ║
║        7  ║ Sarah  ║     600  ║  1 ║
║        5  ║ Sarah  ║     500  ║  2 ║
║        6  ║ Sarah  ║     150  ║  3 ║
║        4  ║ Sarah  ║     100  ║  4 ║
╚═══════════╩════════╩══════════╩════╝

第二个 cte:
  • c2子查询根据来自 ROW_NUMBER 的排名计算运行总数
  • c3计算每个用户的全额

  • 在最终查询中 s子查询找到最低的 running总数超过 80%。

    编辑 3:

    使用 ROW_NUMBER实际上是多余的。
    WITH cte AS
    (
        SELECT c.Customer, c.[User], c.[Revenue]
               ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
               ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
        FROM t c
        CROSS APPLY
             (SELECT SUM(Revenue) AS s
              FROM t c2
              WHERE c.[User] = c2.[User]
                AND c2.Revenue >= c.Revenue) c2
        CROSS APPLY
             (SELECT SUM(Revenue) AS s
              FROM t c2
              WHERE c.[User] = c2.[User]) AS c3
    ) 
    SELECT a.*
    FROM cte a
    CROSS APPLY (SELECT MIN(running_percentile) AS rp
                 FROM cte c2
                 WHERE running_percentile >= 0.8
                   AND c2.[User] = a.[User]) AS s
    WHERE a.running_percentile <= s.rp
    ORDER BY [User], Revenue DESC;
    

    LiveDemo4

    关于sql - 选择超过总值百分比的行子集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36452626/

    相关文章:

    sql-server - "When the Job Succeeds"和 "When the Job Completes"之间的区别

    mysql - 如何在创建表中使用约束或列级别内的选择

    sql - 汇总SQL查询以汇总组付款

    sql - 使用 newquery 构建记录集与联合并连接到数据库相比效率高多少?

    c# - 如何将 SQL Server LocalDB 数据库移动到新计算机

    SQL 通配符搜索 - 效率?

    MySQL : check date range

    java - 如何在 Java 中使用 SQL Server 日期时间类型

    sql-server - 为批量加载到 SQL Server 创建 XML 架构 - 子元素描述父元素

    sql - 仅在存在时更改 TRIGGER