sql - 如何按 "all others"的前 N ​​个类别和总计进行聚合?

标签 sql sql-server tsql sql-server-2017

我有一些表格,按类别列出了用户的销售额(每笔销售额至少有一个类别,也可能有多个类别)。

我可以获取用户的热门类别,但我需要按用户的前 N ​​个类别和其余类别进行统计。

我已将问题归结为 MCVE如下...

MCVE Data Summary :

Salesman    SaleID    Amount    Categories
--------    ------    ------    ------------------------------
     1         1         2      Service
     2         2         2      Software, Support_Contract
     2         3         3      Service
     2         4         1      Parts, Service, Software
     2         5         3      Support_Contract
     2         6         4      Promo_Gift, Support_Contract
     2         7        -2      Rebate, Support_Contract
     3         8         2      Software, Support_Contract
     3         9         3      Service
     3        10         1      Parts, Software
     3        11         3      Support_Contract
     3        12         4      Promo_Gift, Support_Contract
     3        13        -2      Rebate, Support_Contract

MCVE setup SQL:

CREATE TABLE Sales      ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags  ([SaleID] int, [TagId] int);
CREATE TABLE Tags       ([TagId] int, [TagName] varchar(100) );

INSERT INTO Sales
    ([Salesman], [SaleID], [Amount])
VALUES
    (1, 1, 2),        (2, 6, 4),        (3, 10, 1),
    (2, 2, 2),        (2, 7, -2),       (3, 11, 3),
    (2, 3, 3),        (3, 8, 2),        (3, 12, 4),
    (2, 4, 1),        (3, 9, 3),        (3, 13, -2),
    (2, 5, 3)
;
INSERT INTO SalesTags
    ([SaleID], [TagId])
VALUES
    (1, 3),           (6, 4),           (10, 1),
    (2, 1),           (6, 5),           (10, 2),
    (2, 4),           (7, 4),           (11, 4),
    (3, 3),           (7, 6),           (12, 4),
    (4, 1),           (8, 1),           (12, 5),
    (4, 2),           (8, 4),           (13, 4),
    (4, 3),           (9, 3),           (13, 6),
    (5, 4)
;
INSERT INTO Tags
    ([TagId], [TagName])
VALUES
    (1, 'Software'),
    (2, 'Parts'),
    (3, 'Service'),
    (4, 'Support_Contract'),
    (5, 'Promo_Gift'),
    (6, 'Rebate')
;


请参阅this SQL Fiddle ,我可以获得用户的前 N ​​个标签,例如:

WITH usersSales AS (  -- actual base CTE is much more complex
    SELECT  s.SaleID
            , s.Amount
    FROM    Sales s
    WHERE   s.Salesman = 2
)
SELECT Top 3  -- N can be 3 to 10
            t.TagName
            , COUNT (us.SaleID)     AS tagSales
            , SUM (us.Amount)       AS tagAmount
FROM        usersSales us
INNER JOIN  SalesTags st    ON st.SaleID = us.SaleID
INNER JOIN  Tags t          ON t.TagId   = st.TagId
GROUP BY    t.TagName
ORDER BY    tagAmount DESC
            , tagSales DESC
            , t.TagName

--显示用户最重要的类别是:

  1. “Support_Contract”
  2. “服务”
  3. “Promo_Gift”

按此顺序,针对用户 2。(以及针对用户 3 的 Support_Contract、Promo_Gift、软件。)

但是对于 N=3,需要的结果是:

  • 用户 2:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Service                4             2
    Promo Gift             0             0
    - All Others -         0             0
    ============================================
    Totals                11             6
    
  • 用户 3:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Promo_Gift             0             0
    Software               1             1
    - All Others -         3             1
    ============================================
    Totals                11             6
    

地点:

  1. 顶级类别是用户在给定销售中排名最高的类别(根据上面的查询)。
  2. 第 2 行的顶级类别不包括已计入第 1 行的销售额。
  3. 第 3 行的顶级类别不包括已计入第 1 行和第 2 行的销售额。
  4. 等等
  5. 未计入前 N 个类别的所有剩余销售额均集中到 - 所有其他 - 组中。
  6. 底部的总计与用户的总体销售额相符。

如何汇总这样的结果?

请注意,这是在 MS SQL-Server 2017 上运行的,我无法更改表架构。

最佳答案

这是一种方法。 逐步、逐个 CTE 运行查询并检查中间结果以了解其工作原理。

这不是最有效的方法,因为我最终将表连接到自身以消除之前汇总的销售额,但目前我不知道如何避免它。

WITH usersSales 
AS 
(  -- actual base CTE is much more complex
    SELECT
        s.SaleID
        , s.Amount
    FROM Sales s
    WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
    SELECT
        t.TagName
        ,us.Amount
        ,us.SaleID
        ,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
        ,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
    FROM
        usersSales us
        INNER JOIN SalesTags st ON st.SaleID = us.SaleID
        INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
    SELECT
        TagName
        ,Amount
        ,SaleID
        ,TagAmount
        ,TagSales
        ,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
    FROM CTE_Sums
)
,CTE_Final
AS
(
    SELECT
        Main.TagName
        ,Main.Amount
        ,Main.SaleID
        ,Main.TagAmount
        ,Main.TagSales
        ,Main.rnk
        ,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
        ,A.FinalTagSales
    FROM
        CTE_Rank AS Main
        OUTER APPLY
        (
            SELECT
                SUM(Detail.Amount) AS FinalTagAmount
                ,COUNT(*) AS FinalTagSales
            FROM CTE_Rank AS Detail
            WHERE
                Detail.rnk = Main.rnk
                AND Detail.SaleID NOT IN
                (
                    SELECT PrevRanks.SaleID
                    FROM CTE_Rank AS PrevRanks
                    WHERE PrevRanks.rnk < Detail.rnk
                )
        ) AS A
)
SELECT
    TagName
    ,MIN(FinalTagAmount) AS FinalTagAmount
    ,MIN(FinalTagSales) AS FinalTagSales
    ,rnk
    ,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
    TagName
    ,rnk

UNION ALL

SELECT
    '- All Others -' AS TagName
    ,SUM(FinalTagAmount) AS FinalTagAmount
    ,SUM(FinalTagSales) AS FinalTagSales
    ,0 AS rnk
    ,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3

ORDER BY
    SortOrder
    ,rnk
;

CTE_Rank

暂时不要对行进行分组和求和,而是使用窗口聚合来获取每个标签的排名。稍后我们将需要包含单独金额的单独行 (SaleID) 来过滤正在使用的行。

+------------------+--------+--------+-----------+----------+-----+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |
| Service          |      1 |      4 |         4 |        2 |   2 |
| Service          |      3 |      3 |         4 |        2 |   2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |
| Software         |      1 |      4 |         3 |        2 |   4 |
| Software         |      2 |      2 |         3 |        2 |   4 |
| Parts            |      1 |      4 |         1 |        1 |   5 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |
+------------------+--------+--------+-----------+----------+-----+

CTE_Final

OUTER APPLY 通过过滤排名较高的标签中遇到的销售来进行主要计算。

+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |              7 |             4 |
| Service          |      1 |      4 |         4 |        2 |   2 |              4 |             2 |
| Service          |      3 |      3 |         4 |        2 |   2 |              4 |             2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |              0 |             0 |
| Software         |      1 |      4 |         3 |        2 |   4 |              0 |             0 |
| Software         |      2 |      2 |         3 |        2 |   4 |              0 |             0 |
| Parts            |      1 |      4 |         1 |        1 |   5 |              0 |             0 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |              0 |             0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+

查询结果

只需将排名前 3 的标签以及所有其他标签放在一起即可。

+------------------+----------------+---------------+-----+-----------+
|     TagName      | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract |              7 |             4 |   1 |         0 |
| Service          |              4 |             2 |   2 |         0 |
| Promo Gift       |              0 |             0 |   3 |         0 |
| - All Others -   |              0 |             0 |   0 |         1 |
+------------------+----------------+---------------+-----+-----------+

关于sql - 如何按 "all others"的前 N ​​个类别和总计进行聚合?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51256997/

相关文章:

sql - 如何处理单列数据库中的重复条目?

sql - 在 SQL 中执行 EXEC 时使用字符串中的变量值

mysql - 删除来自另一个表的值匹配但未使用的 SQL 行

sql - T-SQL - 未定义脚本变量

mysql - 在 select 语句中 LEFT JOIN 之前使用 WHERE 子句

mysql - SQL Server 中 Select 的正确格式

SQL Server Geography数据类型在线上最近的点

sql - 在 SQL Server 上创建和删除太多表是否有不利影响

SQL 桶确定年龄分组

sql-server - 如何在 SQL 中更改临时表中列的数据类型