我有以下两种数据结构。
第一 ,应用于对象三元组的属性列表:
Object1 Object2 Object3 Property Value
O1 O2 O3 P1 "abc"
O1 O2 O3 P2 "xyz"
O1 O3 O4 P1 "123"
O2 O4 O5 P1 "098"
第二 ,继承树:
O1
O2
O4
O3
O5
或视为一种关系:
Object Parent
O2 O1
O4 O2
O3 O1
O5 O3
O1 null
其语义是 O2 从 O1 继承属性; O4 - 来自 O2 和 O1; O3——来自O1;和 O5 - 来自 O3 和 O1,按优先顺序排列。
注意 1:我有一种有效的方法来选择给定对象的所有子项或所有父项。这目前是通过左右索引实现的,但hierarchyid 也可以工作。这在目前看来并不重要。
注意 2:我有 tiggers 来确保“对象”列始终包含所有可能的对象,即使它们实际上不必在那里(即没有定义父级或子级)。这使得可以使用
inner join
s 而不是效率低得多 outer join
s。目标是 :给定一对 (Property, Value),返回具有该属性的所有对象三元组,该属性具有显式定义或从父项继承的值。
注 1:对象三元组
(X,Y,Z)
被认为是三重 (A,B,C)
的“父”当 X = A
为真时或 X is a parent of A
,同样适用于 (Y,B)
和 (Z,C)
.注意 2:在较近的父级上定义的属性“覆盖”在较远的父级上定义的相同属性。
注 3:当 (A,B,C) 有两个父项 - (X1,Y1,Z1) 和 (X2,Y2,Z2) 时,则 (X1,Y1,Z1) 在以下情况下被视为“更接近”的父项:
(a) X2 是 X1 的父代,或
(b) X2 = X1 且 Y2 是 Y1 的父代,或
(c) X2 = X1 且 Y2 = Y1 且 Z2 是 Z1 的父级
换句话说,三元组的祖先中的“接近度”首先基于三元组的第一个分量,然后是第二个分量,然后是第三个分量。
该规则根据祖先为三元组建立了明确的偏序。
例如,给定一对 (P1, "abc"),三元组的结果集将是:
O1, O2, O3 -- Defined explicitly
O1, O2, O5 -- Because O5 inherits from O3
O1, O4, O3 -- Because O4 inherits from O2
O1, O4, O5 -- Because O4 inherits from O2 and O5 inherits from O3
O2, O2, O3 -- Because O2 inherits from O1
O2, O2, O5 -- Because O2 inherits from O1 and O5 inherits from O3
O2, O4, O3 -- Because O2 inherits from O1 and O4 inherits from O2
O3, O2, O3 -- Because O3 inherits from O1
O3, O2, O5 -- Because O3 inherits from O1 and O5 inherits from O3
O3, O4, O3 -- Because O3 inherits from O1 and O4 inherits from O2
O3, O4, O5 -- Because O3 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
O4, O2, O3 -- Because O4 inherits from O1
O4, O2, O5 -- Because O4 inherits from O1 and O5 inherits from O3
O4, O4, O3 -- Because O4 inherits from O1 and O4 inherits from O2
O5, O2, O3 -- Because O5 inherits from O1
O5, O2, O5 -- Because O5 inherits from O1 and O5 inherits from O3
O5, O4, O3 -- Because O5 inherits from O1 and O4 inherits from O2
O5, O4, O5 -- Because O5 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
请注意,此列表中没有三人组(O2、O4、O5)。这是因为属性 P1 是为三元组 (O2, O4, O5) 明确定义的,这阻止了该三元组从 (O1, O2, O3) 继承该属性。
另请注意,三元组(O4、O4、O5)也没有出现。这是因为该三元组从 (O2, O4, O5) 继承了 P1="098"的值,因为它比 (O1, O2, O3) 更接近父级。
直接的方法如下。
首先,对于定义属性的每个三元组,选择所有可能的子三元组:
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value
from TriplesAndProperties tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.O1
inner join Objects as Objects2 on Objects2.Id = tp.O2
inner join Objects as Objects3 on Objects3.Id = tp.O3
-- Then add all possible children of all those objects
inner join Objects as Children1 on Objects1.Id [isparentof] Children1.Id
inner join Objects as Children2 on Objects2.Id [isparentof] Children2.Id
inner join Objects as Children3 on Objects3.Id [isparentof] Children3.Id
但这并不是故事的全部:如果某个三元组从多个父级继承了相同的属性,则此查询将产生相互矛盾的结果。
因此,第二步是只选择那些相互矛盾的结果之一:
select * from
(
select
Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by Objects1.[depthInTheTree] descending, Objects2.[depthInTheTree] descending, Objects3.[depthInTheTree] descending
)
as InheritancePriority
from
... (see above)
)
where InheritancePriority = 1
窗函数
row_number() over( ... )
执行以下操作:对于对象三元组和属性的每个唯一组合,它按照从三元组到值继承自的父项的祖先距离对所有值进行排序,然后我只选择结果值列表中的第一个。使用
GROUP BY
可以达到类似的效果。和 ORDER BY
语句,但我只是发现窗口函数在语义上更清晰(它们产生的执行计划是相同的)。关键是,我需要选择最接近的贡献祖先,为此我需要分组然后在组内排序。
最后,现在我可以简单地通过属性和值过滤结果集。
这个方案有效。非常可靠和可预测。
事实证明,它对于执行的业务任务非常强大。
唯一的麻烦是,太慢了 .
有人可能会指出七个表的连接可能会减慢速度,但这实际上不是瓶颈。
根据我从 SQL Management Studio(以及 SQL Profiler)获得的实际执行计划,瓶颈是排序。
问题是,为了满足我的窗口函数,服务器必须按
Children1.Id, Children2.Id, Children3.Id, tp.Property, Parents1.[depthInTheTree] descending, Parents2.[depthInTheTree] descending, Parents3.[depthInTheTree] descending
排序,并且不能有它可以使用的索引,因为这些值来自多个表的交叉连接。编辑:根据迈克尔布恩的建议(谢谢迈克尔),我已将整个谜题发布到 sqlfiddle here .在执行计划中可以看到,Sort 操作占整个查询的 32%,并且会随着总行数的增加而增长,因为所有其他操作都使用索引。
通常在这种情况下我会使用索引 View ,但在这种情况下不会,因为索引 View 不能包含自联接,其中有六个。
到目前为止,我能想到的唯一方法是创建 Objects 表的六个副本,然后将它们用于连接,从而启用索引 View 。
是时候让我沦为那种黑客了吗?绝望袭上心头。
最佳答案
我有 3 个可能的答案。
您的问题的 sql fiddle 在这里:http://sqlfiddle.com/#!3/7c7a0/3/0
我的答案的 sql fiddle 在这里:http://sqlfiddle.com/#!3/5d257/1
警告:
工作答案:
这是我从 sqlfiddle 回答的架构部分:
Create Table Objects
(
Id int not null identity primary key,
LeftIndex int not null default 0,
RightIndex int not null default 0
)
alter table Objects add ParentId int null references Objects
CREATE TABLE TP
(
Object1 int not null references Objects,
Object2 int not null references Objects,
Object3 int not null references Objects,
Property varchar(20) not null,
Value varchar(50) not null
)
insert into Objects(LeftIndex, RightIndex) values(1, 10)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 2, 5)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 6, 9)
insert into Objects(ParentId, LeftIndex, RightIndex) values(2, 3, 4)
insert into Objects(ParentId, LeftIndex, RightIndex) values(3, 7, 8)
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P1', 'abc')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P2', 'xyz')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,3,4, 'P1', '123')
insert into TP(Object1, Object2, Object3, Property, Value) values(2,4,5, 'P1', '098')
create index ix_LeftIndex on Objects(LeftIndex)
create index ix_RightIndex on Objects(RightIndex)
create index ix_Objects on TP(Property, Value, Object1, Object2, Object3)
create index ix_Prop on TP(Property)
GO
---------- QUESTION ADDITIONAL SCHEMA --------
CREATE VIEW TPResultView AS
Select O1, O2, O3, Property, Value
FROM
(
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by Objects1.LeftIndex desc, Objects2.LeftIndex desc, Objects3.LeftIndex desc
)
as Idx
from tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.Object1
inner join Objects as Objects2 on Objects2.Id = tp.Object2
inner join Objects as Objects3 on Objects3.Id = tp.Object3
-- Then add all possible children of all those objects
inner join Objects as Children1 on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
inner join Objects as Children2 on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
inner join Objects as Children3 on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
) as x
WHERE idx = 1
GO
---------- ANSWER 1 SCHEMA --------
CREATE VIEW TPIntermediate AS
select tp.Property, tp.Value
, Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
, Objects1.LeftIndex as PL1, Objects2.LeftIndex as PL2, Objects3.LeftIndex as PL3
, Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3
from tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.Object1
inner join Objects as Objects2 on Objects2.Id = tp.Object2
inner join Objects as Objects3 on Objects3.Id = tp.Object3
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
GO
---------- ANSWER 2 SCHEMA --------
-- Partial calculation using an indexed view
-- Circumvented the self-join limitation using a black magic technique, based on
-- http://jmkehayias.blogspot.com/2008/12/creating-indexed-view-with-self-join.html
CREATE TABLE dbo.multiplier (i INT PRIMARY KEY)
INSERT INTO dbo.multiplier VALUES (1)
INSERT INTO dbo.multiplier VALUES (2)
INSERT INTO dbo.multiplier VALUES (3)
GO
CREATE VIEW TPIndexed
WITH SCHEMABINDING
AS
SELECT tp.Object1, tp.object2, tp.object3, tp.property, tp.value,
SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL1,
SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL2,
SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL3,
SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.RightIndex ELSE NULL END, 0)) as PR1,
SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.RightIndex ELSE NULL END, 0)) as PR2,
SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.RightIndex ELSE NULL END, 0)) as PR3,
COUNT_BIG(*) as ID
FROM dbo.tp
cross join dbo.multiplier M
inner join dbo.Objects
on (M.i = 1 AND Objects.Id = tp.Object1)
or (M.i = 2 AND Objects.Id = tp.Object2)
or (M.i = 3 AND Objects.Id = tp.Object3)
GROUP BY tp.Object1, tp.object2, tp.object3, tp.property, tp.value
GO
-- This index is mostly useless but required
create UNIQUE CLUSTERED index pk_TPIndexed on dbo.TPIndexed(property, value, object1, object2, object3)
-- Once we have the clustered index, we can create a nonclustered that actually addresses our needs
create NONCLUSTERED index ix_TPIndexed on dbo.TPIndexed(property, value, PL1, PL2, PL3, PR1, PR2, PR3)
GO
-- NOTE: this View is not indexed, but is uses the indexed view
CREATE VIEW TPIndexedResultView AS
Select O1, O2, O3, Property, Value
FROM
(
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by tp.Property, Children1.Id, Children2.Id, Children3.Id
order by tp.Property, Tp.PL1 desc, Tp.PL2 desc, Tp.PL3 desc
)
as Idx
from TPIndexed as TP WITH (NOEXPAND)
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
) as x
WHERE idx = 1
GO
-- NOTE: this View is not indexed, but is uses the indexed view
CREATE VIEW TPIndexedIntermediate AS
select tp.Property, tp.Value
, Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
, PL1, PL2, PL3
, Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3
from TPIndexed as TP WITH (NOEXPAND)
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
GO
---------- ANSWER 3 SCHEMA --------
-- You're talking about making six copies of the TP table
-- If you're going to go that far, you might as well, go the trigger route
-- The performance profile is much the same - slower on insert, faster on read
-- And instead of still recalculating on every read, you'll be recalculating
-- only when the data changes.
CREATE TABLE TPResult
(
Object1 int not null references Objects,
Object2 int not null references Objects,
Object3 int not null references Objects,
Property varchar(20) not null,
Value varchar(50) not null
)
GO
create UNIQUE index ix_Result on TPResult(Property, Value, Object1, Object2, Object3)
--You'll have to imagine this trigger, sql fiddle doesn't want to do it
--CREATE TRIGGER tr_TP
--ON TP
-- FOR INSERT, UPDATE, DELETE
--AS
-- DELETE FROM TPResult
-- -- For this example we'll just insert into the table once
INSERT INTO TPResult
SELECT O1, O2, O3, Property, Value
FROM TPResultView
从 sqlfiddle 查询我的部分答案:
-------- QUESTION QUERY ----------
-- Original query, modified to use the view I added
SELECT O1, O2, O3, Property, Value
FROM TPResultView
WHERE property = 'P1' AND value = 'abc'
-- Your assertion is that this order by is the most expensive part.
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
-- ORDER BY O1, O2, O3
GO
-------- ANSWER 1 QUERY ----------
-- A different way to get the same result.
-- Query optimizer says this is more expensive, but I've seen cases where
-- it says a query is more expensive but it returns results faster.
SELECT O1, O2, O3, Property, Value
FROM (
SELECT A.O1, A.O2, A.O3, A.Property, A.Value
FROM TPIntermediate A
LEFT JOIN TPIntermediate B ON A.O1 = B.O1
AND A.O2 = B.O2
AND A.O3 = B.O3
AND A.Property = B.Property
AND
(
-- Find any rows with Parent LeftIndex triplet that is greater than this one
(A.PL1 < B.PL1
AND A.PL2 < B.PL2
AND A.PL3 < B.PL3)
OR
-- Find any rows with LeftIndex triplet that is greater than this one
(A.CL1 < B.CL1
AND A.CL2 < B.CL2
AND A.CL3 < B.CL3)
)
-- If this row has any rows that match the previous two cases, exclude it
WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value = 'abc'
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO
-------- ANSWER 2 QUERIES ----------
-- Same as above but using an indexed view to partially calculate results
SELECT O1, O2, O3, Property, Value
FROM TPIndexedResultView
WHERE property = 'P1' AND value = 'abc'
-- Your assertion is that this order by is the most expensive part.
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
--ORDER BY O1, O2, O3
GO
SELECT O1, O2, O3, Property, Value
FROM (
SELECT A.O1, A.O2, A.O3, A.Property, A.Value
FROM TPIndexedIntermediate A
LEFT JOIN TPIndexedIntermediate B ON A.O1 = B.O1
AND A.O2 = B.O2
AND A.O3 = B.O3
AND A.Property = B.Property
AND
(
-- Find any rows with Parent LeftIndex triplet that is greater than this one
(A.PL1 < B.PL1
AND A.PL2 < B.PL2
AND A.PL3 < B.PL3)
OR
-- Find any rows with LeftIndex triplet that is greater than this one
(A.CL1 < B.CL1
AND A.CL2 < B.CL2
AND A.CL3 < B.CL3)
)
-- If this row has any rows that match the previous two cases, exclude it
WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value = 'abc'
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO
-------- ANSWER 3 QUERY ----------
-- Returning results from a pre-calculated table is fast and easy
-- Unless your are doing many more inserts than reads, or your result
-- set is very large, this is a fine way to compensate for a poor design
-- in one area of your database.
SELECT Object1 as O1, Object2 as O2, Object3 as O3, Property, Value
FROM TPResult
WHERE property = 'P1' AND value = 'abc'
ORDER BY O1, O2, O3
关于sql - 有效地通过覆盖处理继承,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11992267/