performance - 参数变量导致查询慢,但为什么呢?

标签 performance sql-server-2008 select join parameters

我有 3 个版本的查询,最终返回相同的结果。

在向相对较小的表和 where 子句中使用参数变量的情况下添加额外的内部连接时,其中一个会显着变慢。

快速查询和慢速查询(包含在每个查询下方)的执行计划非常不同。

我想了解为什么会发生这种情况以及如何预防。

此查询需要 < 1 秒。它没有额外的内部连接,但它在 where 子句中使用参数变量。

    declare @start datetime = '20120115'
    declare @end datetime = '20120116'

    select distinct sups.campaignid 
    from tblSupporterMainDetails sups
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between @start and @end

  |--Parallelism(Gather Streams)
       |--Sort(DISTINCT ORDER BY:([sups].[campaignID] ASC))
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
                 |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
                      |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
                           |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1004]))
                           |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
                           |         |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
                           |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
                                |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]),  WHERE:(PROBE([Bitmap1004],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))

此查询需要 < 1 秒。它有一个额外的内部连接,但在 where 子句中使用参数常量。
    select distinct camps.campaignid 
    from tblCampaigns camps
    inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between '20120115' and '20120116'

  |--Parallelism(Gather Streams)
       |--Hash Match(Right Semi Join, HASH:([sups].[campaignID])=([camps].[campaignID]))
            |--Bitmap(HASH:([sups].[campaignID]), DEFINE:([Bitmap1007]))
            |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[campaignID]))
            |         |--Hash Match(Partial Aggregate, HASH:([sups].[campaignID]))
            |              |--Hash Match(Inner Join, HASH:([calls].[supporterID])=([sups].[supporterID]))
            |                   |--Bitmap(HASH:([calls].[supporterID]), DEFINE:([Bitmap1006]))
            |                   |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([calls].[supporterID]))
            |                   |         |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_callend_supporterid] AS [calls]), SEEK:([calls].[callEnd] >= '2012-01-15 00:00:00.000' AND [calls].[callEnd] <= '2012-01-16 00:00:00.000') ORDERED FORWARD)
            |                   |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([sups].[supporterID]))
            |                        |--Index Scan(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]),  WHERE:(PROBE([Bitmap1006],[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID],N'[IN ROW]')))
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([camps].[campaignID]))
                 |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]),  WHERE:(PROBE([Bitmap1007],[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID],N'[IN ROW]')))

此查询需要 2 分钟。它有一个额外的内部连接并且它在 where 子句中使用参数变量。
    declare @start datetime = '20120115'
    declare @end datetime = '20120116'

    select distinct camps.campaignid 
    from tblCampaigns camps
    inner join tblSupporterMainDetails sups on camps.campaignid = sups.campaignid
    inner join tblCallLogs calls on sups.supporterid = calls.supporterid
    where calls.callEnd between @start and @end

  |--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
       |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
       |--Top(TOP EXPRESSION:((1)))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([calls].[callID], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
                 |    |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
                 |    |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterID_closingCall] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[AUTOGEN_PK_tblCallLogs] AS [calls]), SEEK:([calls].[callID]=[GOGEN].[dbo].[tblCallLogs].[callID] as [calls].[callID]),  WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) LOOKUP ORDERED FORWARD)

笔记:
  • 我相信缓慢是由 tblCallLogs 上的聚集索引搜索引起的。 ,但是不知道为什么SQL Server会选择这个执行计划。
  • 我应该使用查询优化器提示吗?我需要并且不愿意告诉 SQL Server 如何完成它的工作...
  • 该问题似乎是由多种因素引起的 - 额外的连接和变量。
  • 执行计划在为查询找到变量时是否会尝试重用“坏”计划?
  • 在现实生活中,我将不得不使用参数变量。常数不好!所以这个问题可能存在于我的许多查询/存储过程中!
  • 我在 tblCampaigns 上重建了索引并更新了统计信息和 tblSupporterMainDetails .这没有效果。
  • 两个表在主键(标识整数)上都有聚集索引。
  • 外键列campaignid被索引。
  • 所有查询都使用相同的参数值 - 天气它被用作变量或常量。

  • 表中的记录数:
  • tblSupporterMainDetails = 12,561,900
  • tblCallLogs = 27,242,224
  • tblCampaigns = 756

  • 更新:
  • 我还在 tblcalllogs 上重建了索引并更新了统计信息。 .没有效果。
  • 我已经使用 DBCC FREEPROCCACHE 清除了执行计划缓存
  • tblCallLogs.callEnd 是一个日期时间。

  • 涉及列的模式:
    tblCampaign.campaignid int not null
    tblSupporterMainDetails.campaignid int not null
    tblSupporterMainDetails.supporterid int not null
    tblCallLogs.supporterid int not null
    tblCallLogs.callEnd datetime not null
    

    索引:

    Indexes

    更新 2:
    向 tblCallLogs.supporterId 添加索引后 - 包含列:callEnd
    “慢”查询加快了 40 秒。
    更新后的执行计划:
      |--Nested Loops(Inner Join, OUTER REFERENCES:([camps].[campaignID]))
       |--Index Scan(OBJECT:([GOGEN].[dbo].[tblCampaigns].[IX_tblCampaigns_isActive] AS [camps]))
       |--Top(TOP EXPRESSION:((1)))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([sups].[supporterID], [Expr1006]) WITH UNORDERED PREFETCH)
                 |--Index Seek(OBJECT:([GOGEN].[dbo].[tblSupporterMainDetails].[AUTOGEN_IX_tblSupporterMainDetails_campaignID] AS [sups]), SEEK:([sups].[campaignID]=[GOGEN].[dbo].[tblCampaigns].[campaignID] as [camps].[campaignID]) ORDERED FORWARD)
                 |--Index Seek(OBJECT:([GOGEN].[dbo].[tblCallLogs].[IX_tblCallLogs_supporterid_callend] AS [calls]), SEEK:([calls].[supporterID]=[GOGEN].[dbo].[tblSupporterMainDetails].[supporterID] as [sups].[supporterID]),  WHERE:([GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]>=[@s2] AND [GOGEN].[dbo].[tblCallLogs].[callEnd] as [calls].[callEnd]<=[@e2]) ORDERED FORWARD)
    

    解决方案:

    额外的连接实际上并没有直接导致问题,但是它显然改变了语句,以便 sql server 为其持有不同的执行计划。
    通过添加
    选项(重新编译)
    到慢语句的结尾,我能够获得预期的快速性能。即 < 1 秒。我仍然不确定这个解决方案是否有效 - 为什么没有刷新所有计划?这是参数嗅探的经典案例吗?当我知道确切答案时,我会更新这篇文章 - 或者直到有人可以给出明确的答案。
    感谢@LievenKeersmaekers 和@JNK 到目前为止的帮助...

    最佳答案

    导致解决方案的摘要:

    supporterid, callEnd 上添加覆盖索引.

    这里的假设是优化器 可以 使用此索引(与 callEnd、supporterid 相反)来

  • 第一次加入 tblSupporterMainDetailstblCallLogs
  • where 中进一步使用它选择条款 callEnd

  • 添加选项 OPTION(RECOMPILE)

    TiborK 和 Hunchback 的所有 cudo 用于解释使用硬编码常量或变量的优化器的差异。

    Performance Impact - Constant value -vs- Variable

    When you use the constant, the value is known to the optimizer so it can determine selectivity (and possible index usage) based on that. When you use a variable, the value is unknown to the optimizer (so it have to go by some hardwired value or possibly density info). So, technically, this isn't parameter sniffing, but whatever article you find on that subject should also explain the difference between a constant and a variable. Using OPTION(RECOMPILE) will actually turn the variabe to a parameter sniffing situation.

    In essence, there is a big difference between a constant, a variable and a paramater (whcih can be sniffed).

    关于performance - 参数变量导致查询慢,但为什么呢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15682621/

    相关文章:

    sql - 将 IF EXISTS 移至 WHERE 子句

    python - 在这个简单的例子中,为什么 Matlab 看起来比 Python 慢得多

    asp.net - 问题设置订阅查询通知

    sql - SQL Server 连接是否加密?

    c# - SqlCommand 最大参数异常在 2099 个参数

    javascript - jquery:选择选项和背景颜色

    php - C++ map 查找性能与 PHP 数组查找性能

    C# Active Directory 调用非常慢

    mysql - 复制 MySQL 表并填充其他字段

    mysql - 如何进行mysql搜索以排除重复记录