sql - 如何优化使用cte、group by并运行20+小时的SQL查询?

标签 sql sql-server azure query-optimization

我有一个查询,它首先在 cte 内执行联接,然后基于 cte 与其他表联接并通过一些聚合执行 group by 并获取前 15 条记录。

WITH join_table as (
    SELECT pl.yearmonth, 
    pl.pdid, 
    pl.OrderId,
    pl.OrderNo AS OrderNumber, 
    pl.pmc,
    pl.pmcode AS pmaa,
    pl.spid,
    pl.issuingdate,
    pl.plguid, 
    pusa.SizeCode,
    pusa.atnumber,
    pu.tpc, 
    pu.ItemQty AS puItemQty
    FROM table1 pl 
    JOIN table3 pusa ON pl.plguid=pusa.plguid
    JOIN table2 pu ON pusa.plguid=pu.plguid AND pusa.puguid=pu.puguid
    WHERE pl.pmode='BBB'
      and pl.pltype='CCC'
      and pl.plstatus='AAA'
      and pu.ctcode='S'
      AND pu.ctype='DDD'
      AND pu.tpc <> 'EEE' 
),
get_top_15PM as (
    SELECT TOP 15 pl.pmc, sum(cast(puItemQty as BIGINT)) AS SumpuItemQty
    FROM join_table pl
    join abctable b on b.pdid= pl.pdid and b.spid= pl.spid
    WHERE pl.issuingdate > '2021-8-1'
    and prodtypeid in (select prodtypeid from prodtype where prodgrpid in (3,4,5,7,8,9,10,13,20))
    group by pl.pmc
    ORDER BY SumpuItemQty DESC
)
SELECT DISTINCT 
    pl.yearmonth, 
    pl.pdid, 
    pl.OrderId,
    pl.OrderNumber, 
    pl.pmc,
    pl.pmaa,
    pl.spid,
    pl.issuingdate, 
    pl.atnumber,
    pl.tpc, 
    pl.puItemQty,
    s.SizeName AS Size,
    s.SizeLength,
    s.SizeWidth,
    s.SizeHeight,
    s.SizeVolume,
    s.SizeWeight
FROM join_table pl 
JOIN PLSize s ON pl.plguid=s.plguid AND pl.SizeCode=s.SizeCode
WHERE  pl.pmc IN ( SELECT pmc from get_top_15PM) 
      AND pl.yearmonth>=202100

此查询在 Azure SQL 数据库上运行需要 20 多个小时。

详细信息:

Azure 数据库定价层:具有 750GB 存储空间的 S6,剩余 20% 未使用的存储空间

TableName   rows         TotalSpaceGB   UsedSpaceGB UnusedSpaceGB
table2      332,318,173  117.72         117.71      0.01
table3      153,700,352  60.78          60.76       0.01
table1      15,339,815   13.21          13.20       0.01
abctable    1,232,868    0.81           0.80        0.00

估计execution plan with query

实际 Execution plan with query

等待类型:(14ms)PAGEIOLATCH_SH:dev-db:1(*) 或有时NULL。使用 sp_WhoIsActive 得到了这个。

注意:

实际执行计划取自正在运行的查询,如 this SO answer 中所示。

此表包含主键上的聚集索引和其他字段上的非聚集索引,我没有在过滤器、联接或排序依据中使用这些索引。

table1table2 的父级,table2table3 的父级

任何意见或建议都非常值得赞赏。

更新1

--table1
index_name          index_description                 index_keys
idx_table1_pmcode   nonclustered                      pmcode
PK_table1           clustered, unique, primary key    plguid

--table2
index_name          index_description                 index_keys
IX_table2_plguid    nonclustered                      plguid
PK_table2           clustered, unique, primary key    puguid

--table3
index_name          index_description                 index_keys
IX_table3_plguid    nonclustered                      plguid
PK_table3           clustered, unique, primary key    pusguid

--abctable
index_name             index_description               index_keys
nci_wi_abctable_2BC1D  nonclustered                    dpyearmonth
nci_wi_abctable_FAA89  nonclustered                    dpnumber, ptid
PK_abctable            clustered, unique, primary key  OrderId

DDL

--table1
SET ANSI_PADDING ON
GO

CREATE NONCLUSTERED INDEX [idx_table1_pmcode] ON [dbo].[table1]
(
    [pmcode] ASC
)
INCLUDE([OrderID],[OrderNo],[yearmonth],[pdid]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

--table2
SET ANSI_PADDING ON
GO

CREATE NONCLUSTERED INDEX [IX_table2_plguid] ON [dbo].[table2]
(
    [plguid] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

--table3
SET ANSI_PADDING ON
GO

CREATE NONCLUSTERED INDEX [IX_table3_plguid] ON [dbo].[table3]
(
    [plguid] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

--abctable
SET ANSI_PADDING ON
GO


CREATE NONCLUSTERED INDEX [nci_wi_abctable_2BC1D] ON [dbo].[abctable]
(
    [dpyearmonth] ASC
)
INCLUDE([pduid],[pdid]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [nci_wi_abctable_FAA89] ON [dbo].[abctable]
(
    [dpyearmonth] ASC,
    [ptid] ASC
)
INCLUDE([OrderStatus],[spid]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

最佳答案

除了索引的改进之外,您还可以通过使用窗口函数显着改进此查询。

请注意 join_table 在此查询中如何仅被引用一次。我们通过 pmc 进行窗口求和,然后使用 DENSE_RANK 对行进行编号,最后获取编号为 15 或更少的行。

您还应该删除 DISTINCT 并以不同的方式确保您的行是唯一的,例如通过更好地限制您的联接或使用行号。

WITH join_table as (
  SELECT
    pl.yearmonth, 
    pl.pdid, 
    pl.OrderId,
    pl.OrderNo AS OrderNumber, 
    pl.pmc,
    pl.pmcode AS pmaa,
    pl.spid,
    pl.issuingdate,
    pl.plguid, 
    pusa.SizeCode,
    pusa.atnumber,
    pu.tpc, 
    pu.ItemQty AS puItemQty,
    SUM(CAST(pu.ItemQty AS bigint)) OVER (PARTITION BY pl.pmc) AS SumpuItemQty
  FROM table1 pl 
  JOIN table3 pusa ON pl.plguid=pusa.plguid
  JOIN table2 pu ON pusa.plguid=pu.plguid AND pusa.puguid=pu.puguid
  WHERE pl.pmode='BBB'
    and pl.pltype='CCC'
    and pl.plstatus='AAA'
    and pu.ctcode='S'
    AND pu.ctype='DDD'
    AND pu.tpc <> 'EEE' 
    AND pl.issuingdate >= '20210801'
    AND pl.yearmonth >= 202100
),
get_top_15PM as (
    SELECT *,
      DENSE_RANK() OVER (PARTITION BY pl.pmc ORDER BY plSumpuItemQty DESC) AS dr
    FROM join_table pl
    JOIN abctable b on b.pdid = pl.pdid and b.spid = pl.spid
    WHERE prodtypeid in (
        select pd.prodtypeid
        from prodtype pd
        where prodgrpid in (3,4,5,7,8,9,10,13,20)
    )
)
SELECT
    pl.yearmonth, 
    pl.pdid, 
    pl.OrderId,
    pl.OrderNumber, 
    pl.pmc,
    pl.pmaa,
    pl.spid,
    pl.issuingdate, 
    pl.atnumber,
    pl.tpc, 
    pl.puItemQty,
    s.SizeName AS Size,
    s.SizeLength,
    s.SizeWidth,
    s.SizeHeight,
    s.SizeVolume,
    s.SizeWeight
FROM get_top_15PM pl 
JOIN PLSize s ON pl.plguid = s.plguid AND pl.SizeCode = s.SizeCode
WHERE pl.dr <= 15

关于sql - 如何优化使用cte、group by并运行20+小时的SQL查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76817390/

相关文章:

已删除 Blob 的 Azure 存储计费

java - 如何使用准备好的语句

c# - API + SQL 存储的 Azure 配置

sql - 从表中选择星号,但如果值为负值则变为正值

c# - 简单 POCO 的简单 T4 生成

azure - 有谁知道如何计算 Windows Azure 预留实例的软件成本?

mysql - 哪个子查询更快?

PHP查询MySql表2行之间的差异

c# - 在数据库中存储第三方服务登录名/密码

c# - 我什么时候应该使用 ServiceFabricIntegrationOptions.UseUniqueServiceUrl