我有一个查询,它首先在 cte
内执行联接,然后基于 cte 与其他表联接并通过一些聚合执行 group by 并获取前 15 条记录。
WITH join_table as (
SELECT pl.yearmonth,
pl.pdid,
pl.OrderId,
pl.OrderNo AS OrderNumber,
pl.pmc,
pl.pmcode AS pmaa,
pl.spid,
pl.issuingdate,
pl.plguid,
pusa.SizeCode,
pusa.atnumber,
pu.tpc,
pu.ItemQty AS puItemQty
FROM table1 pl
JOIN table3 pusa ON pl.plguid=pusa.plguid
JOIN table2 pu ON pusa.plguid=pu.plguid AND pusa.puguid=pu.puguid
WHERE pl.pmode='BBB'
and pl.pltype='CCC'
and pl.plstatus='AAA'
and pu.ctcode='S'
AND pu.ctype='DDD'
AND pu.tpc <> 'EEE'
),
get_top_15PM as (
SELECT TOP 15 pl.pmc, sum(cast(puItemQty as BIGINT)) AS SumpuItemQty
FROM join_table pl
join abctable b on b.pdid= pl.pdid and b.spid= pl.spid
WHERE pl.issuingdate > '2021-8-1'
and prodtypeid in (select prodtypeid from prodtype where prodgrpid in (3,4,5,7,8,9,10,13,20))
group by pl.pmc
ORDER BY SumpuItemQty DESC
)
SELECT DISTINCT
pl.yearmonth,
pl.pdid,
pl.OrderId,
pl.OrderNumber,
pl.pmc,
pl.pmaa,
pl.spid,
pl.issuingdate,
pl.atnumber,
pl.tpc,
pl.puItemQty,
s.SizeName AS Size,
s.SizeLength,
s.SizeWidth,
s.SizeHeight,
s.SizeVolume,
s.SizeWeight
FROM join_table pl
JOIN PLSize s ON pl.plguid=s.plguid AND pl.SizeCode=s.SizeCode
WHERE pl.pmc IN ( SELECT pmc from get_top_15PM)
AND pl.yearmonth>=202100
此查询在 Azure SQL 数据库上运行需要 20 多个小时。
详细信息:
Azure 数据库定价层:具有 750GB 存储空间的 S6,剩余 20% 未使用的存储空间
TableName rows TotalSpaceGB UsedSpaceGB UnusedSpaceGB
table2 332,318,173 117.72 117.71 0.01
table3 153,700,352 60.78 60.76 0.01
table1 15,339,815 13.21 13.20 0.01
abctable 1,232,868 0.81 0.80 0.00
等待类型:(14ms)PAGEIOLATCH_SH:dev-db:1(*)
或有时NULL
。使用 sp_WhoIsActive
得到了这个。
注意:
实际执行计划取自正在运行的查询,如 this SO answer 中所示。
此表包含主键上的聚集索引和其他字段上的非聚集索引,我没有在过滤器、联接或排序依据中使用这些索引。
table1
是 table2
的父级,table2
是 table3
的父级
任何意见或建议都非常值得赞赏。
更新1
--table1
index_name index_description index_keys
idx_table1_pmcode nonclustered pmcode
PK_table1 clustered, unique, primary key plguid
--table2
index_name index_description index_keys
IX_table2_plguid nonclustered plguid
PK_table2 clustered, unique, primary key puguid
--table3
index_name index_description index_keys
IX_table3_plguid nonclustered plguid
PK_table3 clustered, unique, primary key pusguid
--abctable
index_name index_description index_keys
nci_wi_abctable_2BC1D nonclustered dpyearmonth
nci_wi_abctable_FAA89 nonclustered dpnumber, ptid
PK_abctable clustered, unique, primary key OrderId
DDL
--table1
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [idx_table1_pmcode] ON [dbo].[table1]
(
[pmcode] ASC
)
INCLUDE([OrderID],[OrderNo],[yearmonth],[pdid]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
--table2
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [IX_table2_plguid] ON [dbo].[table2]
(
[plguid] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
--table3
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [IX_table3_plguid] ON [dbo].[table3]
(
[plguid] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
--abctable
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [nci_wi_abctable_2BC1D] ON [dbo].[abctable]
(
[dpyearmonth] ASC
)
INCLUDE([pduid],[pdid]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nci_wi_abctable_FAA89] ON [dbo].[abctable]
(
[dpyearmonth] ASC,
[ptid] ASC
)
INCLUDE([OrderStatus],[spid]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
最佳答案
除了索引的改进之外,您还可以通过使用窗口函数显着改进此查询。
请注意 join_table
在此查询中如何仅被引用一次。我们通过 pmc
进行窗口求和,然后使用 DENSE_RANK
对行进行编号,最后获取编号为 15 或更少的行。
您还应该删除 DISTINCT
并以不同的方式确保您的行是唯一的,例如通过更好地限制您的联接或使用行号。
WITH join_table as (
SELECT
pl.yearmonth,
pl.pdid,
pl.OrderId,
pl.OrderNo AS OrderNumber,
pl.pmc,
pl.pmcode AS pmaa,
pl.spid,
pl.issuingdate,
pl.plguid,
pusa.SizeCode,
pusa.atnumber,
pu.tpc,
pu.ItemQty AS puItemQty,
SUM(CAST(pu.ItemQty AS bigint)) OVER (PARTITION BY pl.pmc) AS SumpuItemQty
FROM table1 pl
JOIN table3 pusa ON pl.plguid=pusa.plguid
JOIN table2 pu ON pusa.plguid=pu.plguid AND pusa.puguid=pu.puguid
WHERE pl.pmode='BBB'
and pl.pltype='CCC'
and pl.plstatus='AAA'
and pu.ctcode='S'
AND pu.ctype='DDD'
AND pu.tpc <> 'EEE'
AND pl.issuingdate >= '20210801'
AND pl.yearmonth >= 202100
),
get_top_15PM as (
SELECT *,
DENSE_RANK() OVER (PARTITION BY pl.pmc ORDER BY plSumpuItemQty DESC) AS dr
FROM join_table pl
JOIN abctable b on b.pdid = pl.pdid and b.spid = pl.spid
WHERE prodtypeid in (
select pd.prodtypeid
from prodtype pd
where prodgrpid in (3,4,5,7,8,9,10,13,20)
)
)
SELECT
pl.yearmonth,
pl.pdid,
pl.OrderId,
pl.OrderNumber,
pl.pmc,
pl.pmaa,
pl.spid,
pl.issuingdate,
pl.atnumber,
pl.tpc,
pl.puItemQty,
s.SizeName AS Size,
s.SizeLength,
s.SizeWidth,
s.SizeHeight,
s.SizeVolume,
s.SizeWeight
FROM get_top_15PM pl
JOIN PLSize s ON pl.plguid = s.plguid AND pl.SizeCode = s.SizeCode
WHERE pl.dr <= 15
关于sql - 如何优化使用cte、group by并运行20+小时的SQL查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76817390/