sql-server - 如何在不使用 SQL Server 中的子查询的情况下连接 GROUP BY 子句中的字符串而无需额外的查询?

标签 sql-server group-by aggregate-functions sql-server-group-concat

我正在寻找 SQL Server 2012 中 GROUP_CONCAT() MySQL 函数的等效项 - 它使用子查询,解释如下:

CREATE TABLE Temp
( 
ID INT PRIMARY KEY NOT NULL IDENTITY(1,1),
ColA varchar(900) NULL,
ColB varchar(900) NULL
)

INSERT INTO Temp (ColA, ColB)
SELECT 'A', 'some' UNION ALL
SELECT 'A', 'thing' UNION ALL
SELECT 'A', 'and' UNION ALL
SELECT 'B', 'some' UNION ALL
SELECT 'B', 'more' UNION ALL
SELECT 'B', 'and' UNION ALL
SELECT 'B', 'more' UNION ALL
SELECT 'C', 'things' UNION ALL
SELECT 'C', 'things'

-- Desired Output. Note that the lists are in descending order of frequency ('more' appears twice)
ColA, Frequency, ColBs
'B', 4, 'more, some, and'
'A', 3, 'some, thing, and'
'C', 2, 'things'

SELECT 
    ColA, 
    COUNT(*) as Frequency, 
    GROUP_CONCAT(ColB) --Would be nice
FROM Temp
GROUP BY ColA
ORDER BY Frequency DESC

在 SQL Server 中对此问题的常见答案是在子查询上使用 STUFF()。就我而言,性能根本无法接受(2 亿条记录,每个子查询 26 秒 * 2 亿 = 164 年)。

SELECT 
    ColA, 
    COUNT(*) as Frequency, 
    ISNULL(
        STUFF((
            SELECT ', ' + ColBs FROM
                (SELECT ColBs, Count(*) as Frequency
                FROM Temp sub
                WHERE sub.ColA = t.ColA
                GROUP BY ColB
                ORDER BY Frequency DESC)
            FOR XML PATH('')
        ), 1, 2, '')
    ), '') as ColBs --Would take 164 years on the entire data set
FROM Temp t
GROUP BY ColA
ORDER BY Frequency DESC

所需的输出是每个唯一 ColA 的 ColB 值,按出现次数降序分组在一起,如上所示。但是,这需要通过表进行单个查询来完成。

我需要自己构建这个并放弃“GROUP BY”调用吗?手动迭代数据集并在控制台应用程序中构建新表?还是我遗漏了什么?

最佳答案

试试这个:

WITH prelim
AS
(
   SELECT
     cola
    ,colb
    ,count(*) AS recs
    ,row_number() over (partition BY cola ORDER BY count(*) DESC ,colb) AS recno
    ,Count(*) over (partition BY cola ) AS cnt
  FROM TEMP
  GROUP BY cola,colb ),
Group_Concat (recno,cnt,recs,cola,colbs)
AS
(
SELECT
    recno
    ,cnt
    ,recs
    ,cola
    ,CAST (colb AS varchar(MAX)) AS colbs
FROM
    prelim
WHERE
    recno=1
UNION ALL
SELECT
    p.recno
    ,p.cnt
    ,g.recs+p.recs
    ,p.cola
    , g.colbs + ', ' + CAST (p.colb AS varchar(MAX)) AS colbs
FROM
    prelim p
    JOIN Group_Concat g ON p.cola=g.cola AND p.recno=g.recno+1
)

SELECT COLA,Recs as Frequency,COLBS 
FROM Group_Concat
where recno=cnt
order by cola

关于sql-server - 如何在不使用 SQL Server 中的子查询的情况下连接 GROUP BY 子句中的字符串而无需额外的查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20276002/

相关文章:

sql-server - HierarchyID:获取所有后代以获得 parent 列表

python - 计算 pandas DataFrame 行中买入和卖出数量的差异

sql - 计算包含字母/数字的行数

sql - 使用多个聚合函数查找每组的最后一条记录-SQLite

python - Pandas 数据框分组求和

java - 如何在 Sql Server 2000 上使用 Hibernate 调用存储过程?

sql-server - 如何复制包含除标识列之外的每一列的行 (SQL Server 2005)

c# - 如何在 Web 表单上输入值并将其作为数字存储在 SQL Server 中

python - 在对另一列进行分组后,查找列值的最大出现次数

mysql - 按 mysql/MariaDB 中的变量范围分组