我在尝试解决一个更复杂的问题时遇到了这个小问题,并且在尝试找出优化器时已经陷入困境。所以,假设我有一个名为“MyTable”的表,可以这样定义:
CREATE TABLE MyTable (
GroupClosuresID int identity(1,1) not null,
SiteID int not null,
DeleteDateTime datetime null
, CONSTRAINT PK_MyTable PRIMARY KEY (GroupClosuresID, SiteID))
该表有 286,685 行,运行 DBCC SHOW_STATISTICS('MyTable','PK_MyTable')
将产生:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
PK_MyTable Aug 10 2011 1:00PM 286685 286685 18 0.931986 8 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3.743145E-06 4 GroupClosuresID
3.488149E-06 8 GroupClosuresID, SiteID
(2 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
1 0 8 0 1
129 1002 7 127 7.889764
242 826 6 112 7.375
531 2010 6 288 6.979167
717 1108 5 185 5.989189
889 822 4 171 4.807017
1401 2044 4 511 4
1763 1101 3 361 3.049861
14207 24780 1 12443 1.991481
81759 67071 1 67071 1
114457 31743 1 31743 1
117209 2047 1 2047 1
179109 61439 1 61439 1
181169 1535 1 1535 1
229410 47615 1 47615 1
235846 2047 1 2047 1
275456 39442 1 39442 1
275457 0 1 0 1
现在我对此表运行查询,但没有创建任何其他索引或统计信息。
SELECT GroupClosuresID FROM MyTable WHERE SiteID = 1397 AND DeleteDateTime IS NULL
现在出现两个新的统计对象,一个用于 SiteID
列,另一个用于 DeleteDateTime
列。分别如下(注:排除了一些不相关的信息):
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
_WA_Sys_00000002_7B0C223C Aug 10 2011 1:15PM 286685 216605 200 0.03384706 4 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0007380074 4 SiteID
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
.
.
.
1397 59.42782 16005.02 5 11.83174
.
.
.
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
_WA_Sys_00000006_7B0C223C Aug 10 2011 1:15PM 286685 216605 201 0.7447883 0.8335911 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0001065871 0.8335911 DeleteDateTime
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
----------------------- ------------- ------------- -------------------- --------------
NULL 0 255827 0 1
.
.
.
为上面运行的查询生成的执行计划没有给我带来惊喜。它由一个简单的聚集索引扫描组成,其中包含 14282.3 估计行和 15676 实际行。根据我对统计和成本估算的了解,使用上面的两个直方图,我们可以将 SiteID (16005.02/286685) 的选择性乘以 DeleteDateTime (255827/286685) 的选择性,以获得 0.0498187307480119 的复合选择性。乘以总行数 (286685) 得到的结果与优化器所做的完全相同:14282.3。
但这就是我感到困惑的地方。我使用 CREATE INDEX IX_MyTable ON Mytable (SiteID, DeleteDateTime)
创建一个索引,它创建自己的统计对象:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
IX_MyTable Aug 10 2011 1:41PM 286685 286685 200 0.02749305 8.822645 NO NULL
286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0007107321 4 SiteID
7.42611E-05 4.822645 SiteID, DeleteDateTime
3.488149E-06 8.822645 SiteID, DeleteDateTime, GroupClosuresID
(3 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
.
.
.
1397 504 15686 12 42
.
.
.
当我运行与以前相同的查询时(SELECT GroupClosuresID FROM MyTable WHERE SiteID = 1397 AND DeleteDateTime IS NULL
),我仍然返回 15676 行,但我的估计行数现在为 181.82 。
我尝试过操纵数字来尝试找出这个估计值的来源,但我就是无法得到它。我必须假设它与 IX_MyTable 的密度值有关。
任何帮助将不胜感激。谢谢!!
编辑:这是最后一次查询执行的执行计划。
最佳答案
这个需要一些挖掘!
它的产品是:
- 日期字段中的
NULL
密度(来自第一组统计数据255827/286685 = .892363
- ...乘以新索引中第一个字段 (
siteid
) 的密度:0.0007107321
公式为:
.00071017321 * 286685 = 203.7562
-- est. rows with your value in siteid based on even distribution of values
255827 / 286685 = 0.892363
-- Probability of a NULL across all rows
203.7562 * 0.892363 = 181.8245
我猜测,由于此实例中的行计数实际上不会影响任何内容,因此优化器采用了最简单的路线,只是将概率相乘。
关于sql - 统计和基数估计 - 为什么我会看到这个结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7015520/