sql - 有效地通过覆盖处理继承

标签 sql sql-server sql-server-2008 tree

我有以下两种数据结构。

第一 ,应用于对象三元组的属性列表:

Object1  Object2  Object3 Property  Value
     O1       O2       O3       P1  "abc"
     O1       O2       O3       P2  "xyz"
     O1       O3       O4       P1  "123"
     O2       O4       O5       P1  "098"

第二 ,继承树:
O1
    O2
        O4
    O3
        O5

或视为一种关系:
Object    Parent
    O2        O1
    O4        O2
    O3        O1
    O5        O3
    O1      null

其语义是 O2 从 O1 继承属性; O4 - 来自 O2 和 O1; O3——来自O1;和 O5 - 来自 O3 和 O1,按优先顺序排列。
注意 1:我有一种有效的方法来选择给定对象的所有子项或所有父项。这目前是通过左右索引实现的,但hierarchyid 也可以工作。这在目前看来并不重要。
注意 2:我有 tiggers 来确保“对象”列始终包含所有可能的对象,即使它们实际上不必在那里(即没有定义父级或子级)。这使得可以使用 inner join s 而不是效率低得多 outer join s。

目标是 :给定一对 (Property, Value),返回具有该属性的所有对象三元组,该属性具有显式定义或从父项继承的值。

注 1:对象三元组 (X,Y,Z)被认为是三重 (A,B,C) 的“父”当 X = A 为真时或 X is a parent of A ,同样适用于 (Y,B)(Z,C) .
注意 2:在较近的父级上定义的属性“覆盖”在较远的父级上定义的相同属性。
注 3:当 (A,B,C) 有两个父项 - (X1,Y1,Z1) 和 (X2,Y2,Z2) 时,则 (X1,Y1,Z1) 在以下情况下被视为“更接近”的父项:
(a) X2 是 X1 的父代,或
(b) X2 = X1 且 Y2 是 Y1 的父代,或
(c) X2 = X1 且 Y2 = Y1 且 Z2 是 Z1 的父级

换句话说,三元组的祖先中的“接近度”首先基于三元组的第一个分量,然后是第二个分量,然后是第三个分量。
该规则根据祖先为三元组建立了明确的偏序。

例如,给定一对 (P1, "abc"),三元组的结果集将是:
 O1, O2, O3     -- Defined explicitly
 O1, O2, O5     -- Because O5 inherits from O3
 O1, O4, O3     -- Because O4 inherits from O2
 O1, O4, O5     -- Because O4 inherits from O2 and O5 inherits from O3
 O2, O2, O3     -- Because O2 inherits from O1
 O2, O2, O5     -- Because O2 inherits from O1 and O5 inherits from O3
 O2, O4, O3     -- Because O2 inherits from O1 and O4 inherits from O2
 O3, O2, O3     -- Because O3 inherits from O1
 O3, O2, O5     -- Because O3 inherits from O1 and O5 inherits from O3
 O3, O4, O3     -- Because O3 inherits from O1 and O4 inherits from O2
 O3, O4, O5     -- Because O3 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
 O4, O2, O3     -- Because O4 inherits from O1
 O4, O2, O5     -- Because O4 inherits from O1 and O5 inherits from O3
 O4, O4, O3     -- Because O4 inherits from O1 and O4 inherits from O2
 O5, O2, O3     -- Because O5 inherits from O1
 O5, O2, O5     -- Because O5 inherits from O1 and O5 inherits from O3
 O5, O4, O3     -- Because O5 inherits from O1 and O4 inherits from O2
 O5, O4, O5     -- Because O5 inherits from O1 and O4 inherits from O2 and O5 inherits from O3

请注意,此列表中没有三人组(O2、O4、O5)。这是因为属性 P1 是为三元组 (O2, O4, O5) 明确定义的,这阻止了该三元组从 (O1, O2, O3) 继承该属性。
另请注意,三元组(O4、O4、O5)也没有出现。这是因为该三元组从 (O2, O4, O5) 继承了 P1="098"的值,因为它比 (O1, O2, O3) 更接近父级。

直接的方法如下。
首先,对于定义属性的每个三元组,选择所有可能的子三元组:
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value
from TriplesAndProperties tp

-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.O1
inner join Objects as Objects2 on Objects2.Id = tp.O2
inner join Objects as Objects3 on Objects3.Id = tp.O3

-- Then add all possible children of all those objects
inner join Objects as Children1 on Objects1.Id [isparentof] Children1.Id
inner join Objects as Children2 on Objects2.Id [isparentof] Children2.Id
inner join Objects as Children3 on Objects3.Id [isparentof] Children3.Id

但这并不是故事的全部:如果某个三元组从多个父级继承了相同的属性,则此查询将产生相互矛盾的结果。
因此,第二步是只选择那些相互矛盾的结果之一:
select * from
(
    select 
        Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
        row_number() over( 
            partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
            order by Objects1.[depthInTheTree] descending, Objects2.[depthInTheTree] descending, Objects3.[depthInTheTree] descending
        )
        as InheritancePriority
    from
    ... (see above)
)
where InheritancePriority = 1

窗函数row_number() over( ... )执行以下操作:对于对象三元组和属性的每个唯一组合,它按照从三元组到值继承自的父项的祖先距离对所有值进行排序,然后我只选择结果值列表中的第一个。
使用 GROUP BY 可以达到类似的效果。和 ORDER BY语句,但我只是发现窗口函数在语义上更清晰(它们产生的执行计划是相同的)。
关键是,我需要选择最接近的贡献祖先,为此我需要分组然后在组内排序。

最后,现在我可以简单地通过属性和值过滤结果集。

这个方案有效。非常可靠和可预测。
事实证明,它对于执行的业务任务非常强大。

唯一的麻烦是,太慢了 .
有人可能会指出七个表的连接可能会减慢速度,但这实际上不是瓶颈。

根据我从 SQL Management Studio(以及 SQL Profiler)获得的实际执行计划,瓶颈是排序。
问题是,为了满足我的窗口函数,服务器必须按 Children1.Id, Children2.Id, Children3.Id, tp.Property, Parents1.[depthInTheTree] descending, Parents2.[depthInTheTree] descending, Parents3.[depthInTheTree] descending 排序,并且不能有它可以使用的索引,因为这些值来自多个表的交叉连接。

编辑:根据迈克尔布恩的建议(谢谢迈克尔),我已将整个谜题发布到 sqlfiddle here .在执行计划中可以看到,Sort 操作占整个查询的 32%,并且会随着总行数的增加而增长,因为所有其他操作都使用索引。

通常在这种情况下我会使用索引 View ,但在这种情况下不会,因为索引 View 不能包含自联接,其中有六个。

到目前为止,我能想到的唯一方法是创建 Objects 表的六个副本,然后将它们用于连接,从而启用索引 View 。
是时候让我沦为那种黑客了吗?绝望袭上心头。

最佳答案

我有 3 个可能的答案。

您的问题的 sql fiddle 在这里:http://sqlfiddle.com/#!3/7c7a0/3/0

我的答案的 sql fiddle 在这里:http://sqlfiddle.com/#!3/5d257/1

警告:

  • 查询分析器不够用 - 我注意到一些答案被拒绝,因为他们的查询计划比原始查询更昂贵。分析仪只是指南。根据实际的数据集、硬件和用例,成本较高的查询可以比成本较低的查询更快地返回结果。您必须在您的环境中进行测试。
  • 查询分析器无效 - 即使您找到了从查询中删除“最昂贵的步骤”的方法,它通常对您的查询没有任何影响。
  • 仅查询更改很少能缓解架构/设计问题 - 一些答案被拒绝,因为它们涉及架构级别的更改,例如触发器和附加表。拒绝优化的复杂查询强烈表明问题出在底层设计或我的期望上。您可能不喜欢它,但您可能不得不接受该问题在查询级别无法解决。
  • 索引 View 不能包含 row_number()/分区子句 - 通过创建对象表的六个副本来解决自联接问题不足以让您创建您建议的索引 View 。我在 this sqlfiddle 试过了.如果您取消注释最后一个“创建索引”语句,您将收到错误消息,因为您的 View “包含排名或聚合窗口函数”。

  • 工作答案:
  • 左连接而不是 row_number() - 您可以使用使用左连接的查询来排除在树中被覆盖较低的结果。从此查询中删除最后的“order by”实际上删除了一直困扰您的排序!此查询的执行计划仍然比您的原始计划昂贵,但请参阅上面的免责声明 #1。
  • 部分查询的索引 View - 使用一些严肃的查询魔法(基于 this technique),我为部分查询创建了一个索引 View 。此 View 可用于增强原始问题查询或答案 #1。
  • 实现为索引良好的表 - 其他人提出了这个答案,但他们可能没有很好地解释它。除非您的结果集非常大或者您对源表进行非常频繁的更新,否则实现查询的结果并使用触发器使它们保持最新是解决此类问题的完美方法。一旦你为你的查询创建了一个 View ,测试这个选项就很容易了。您可以重复使用答案 #2 来加速触发,然后随着时间的推移进一步改进。 (您正在谈论创建表的六个副本,请先尝试此操作。它可以保证您关心的选择的性能尽可能好。)

  • 这是我从 sqlfiddle 回答的架构部分:
    Create Table Objects
    (
        Id int not null identity primary key,
        LeftIndex int not null default 0,
        RightIndex int not null default 0
    )
    
    alter table Objects add ParentId int null references Objects
    
    CREATE TABLE TP
    (
        Object1 int not null references Objects,
        Object2 int not null references Objects,
        Object3 int not null references Objects,
        Property varchar(20) not null,
        Value varchar(50) not null
    )
    
    
    insert into Objects(LeftIndex, RightIndex) values(1, 10)
    insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 2, 5)
    insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 6, 9)
    insert into Objects(ParentId, LeftIndex, RightIndex) values(2, 3, 4)
    insert into Objects(ParentId, LeftIndex, RightIndex) values(3, 7, 8)
    
    insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P1', 'abc')
    insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P2', 'xyz')
    insert into TP(Object1, Object2, Object3, Property, Value) values(1,3,4, 'P1', '123')
    insert into TP(Object1, Object2, Object3, Property, Value) values(2,4,5, 'P1', '098')
    
    create index ix_LeftIndex on Objects(LeftIndex)
    create index ix_RightIndex on Objects(RightIndex)
    create index ix_Objects on TP(Property, Value, Object1, Object2, Object3)
    create index ix_Prop on TP(Property)
    GO
    
    ---------- QUESTION ADDITIONAL SCHEMA --------
    CREATE VIEW TPResultView AS
    Select O1, O2, O3, Property, Value
    FROM
    (
        select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
    
        row_number() over( 
            partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
            order by Objects1.LeftIndex desc, Objects2.LeftIndex desc, Objects3.LeftIndex desc
        )
        as Idx
    
        from tp
    
        -- Select corresponding objects of the triple
        inner join Objects as Objects1 on Objects1.Id = tp.Object1
        inner join Objects as Objects2 on Objects2.Id = tp.Object2
        inner join Objects as Objects3 on Objects3.Id = tp.Object3
    
        -- Then add all possible children of all those objects
        inner join Objects as Children1 on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
        inner join Objects as Children2 on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
        inner join Objects as Children3 on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
    ) as x
    WHERE idx = 1 
    GO
    
    ---------- ANSWER 1 SCHEMA --------
    
    CREATE VIEW TPIntermediate AS
    select tp.Property, tp.Value 
        , Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
        , Objects1.LeftIndex as PL1, Objects2.LeftIndex as PL2, Objects3.LeftIndex as PL3    
        , Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3    
        from tp
    
        -- Select corresponding objects of the triple
        inner join Objects as Objects1 on Objects1.Id = tp.Object1
        inner join Objects as Objects2 on Objects2.Id = tp.Object2
        inner join Objects as Objects3 on Objects3.Id = tp.Object3
    
        -- Then add all possible children of all those objects
        inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
        inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
        inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
    GO
    
    ---------- ANSWER 2 SCHEMA --------
    
    -- Partial calculation using an indexed view
    -- Circumvented the self-join limitation using a black magic technique, based on 
    -- http://jmkehayias.blogspot.com/2008/12/creating-indexed-view-with-self-join.html
    CREATE TABLE dbo.multiplier (i INT PRIMARY KEY)
    
    INSERT INTO dbo.multiplier VALUES (1) 
    INSERT INTO dbo.multiplier VALUES (2) 
    INSERT INTO dbo.multiplier VALUES (3) 
    GO
    
    CREATE VIEW TPIndexed
    WITH SCHEMABINDING
    AS
    
    SELECT tp.Object1, tp.object2, tp.object3, tp.property, tp.value,
        SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL1,
        SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL2,
        SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL3,
        SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.RightIndex ELSE NULL END, 0)) as PR1,
        SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.RightIndex ELSE NULL END, 0)) as PR2,
        SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.RightIndex ELSE NULL END, 0)) as PR3,
        COUNT_BIG(*) as ID
        FROM dbo.tp
        cross join dbo.multiplier M 
        inner join dbo.Objects 
        on (M.i = 1 AND Objects.Id = tp.Object1)
        or (M.i = 2 AND Objects.Id = tp.Object2)
        or (M.i = 3 AND Objects.Id = tp.Object3)
    GROUP BY tp.Object1, tp.object2, tp.object3, tp.property, tp.value
    GO
    
    -- This index is mostly useless but required
    create UNIQUE CLUSTERED index pk_TPIndexed on dbo.TPIndexed(property, value, object1, object2, object3)
    -- Once we have the clustered index, we can create a nonclustered that actually addresses our needs
    create NONCLUSTERED index ix_TPIndexed on dbo.TPIndexed(property, value, PL1, PL2, PL3, PR1, PR2, PR3)
    GO
    
    -- NOTE: this View is not indexed, but is uses the indexed view 
    CREATE VIEW TPIndexedResultView AS
    Select O1, O2, O3, Property, Value
    FROM
    (
        select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
    
        row_number() over( 
            partition by tp.Property, Children1.Id, Children2.Id, Children3.Id
            order by tp.Property, Tp.PL1 desc, Tp.PL2 desc, Tp.PL3 desc
        )
        as Idx
    
        from TPIndexed as TP WITH (NOEXPAND)
    
        -- Then add all possible children of all those objects
        inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
        inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
        inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
    ) as x
    WHERE idx = 1 
    GO
    
    
    -- NOTE: this View is not indexed, but is uses the indexed view 
    CREATE VIEW TPIndexedIntermediate AS
    select tp.Property, tp.Value 
        , Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
        , PL1, PL2, PL3    
        , Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3    
        from TPIndexed as TP WITH (NOEXPAND)
    
        -- Then add all possible children of all those objects
        inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
        inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
        inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3  
    GO
    
    
    ---------- ANSWER 3 SCHEMA --------
    -- You're talking about making six copies of the TP table
    -- If you're going to go that far, you might as well, go the trigger route
    -- The performance profile is much the same - slower on insert, faster on read
    -- And instead of still recalculating on every read, you'll be recalculating
    -- only when the data changes. 
    
    CREATE TABLE TPResult
    (
        Object1 int not null references Objects,
        Object2 int not null references Objects,
        Object3 int not null references Objects,
        Property varchar(20) not null,
        Value varchar(50) not null
    )
    GO
    
    create UNIQUE index ix_Result on TPResult(Property, Value, Object1, Object2, Object3)
    
    
    --You'll have to imagine this trigger, sql fiddle doesn't want to do it
    --CREATE TRIGGER tr_TP
    --ON TP
    --  FOR INSERT, UPDATE, DELETE
    --AS
    --  DELETE FROM TPResult
    -- -- For this example we'll just insert into the table once
    INSERT INTO TPResult 
    SELECT O1, O2, O3, Property, Value 
    FROM TPResultView
    

    从 sqlfiddle 查询我的部分答案:
    -------- QUESTION QUERY ----------
    -- Original query, modified to use the view I added
    SELECT O1, O2, O3, Property, Value 
    FROM TPResultView
    WHERE property = 'P1' AND value = 'abc'
    -- Your assertion is that this order by is the most expensive part. 
    -- Sometimes converting queries into views allows the server to
    -- Optimize them better over time.
    -- NOTE: removing this order by has no effect on this query.
    -- ORDER BY O1, O2, O3
    GO
    
    -------- ANSWER 1  QUERY ----------
    -- A different way to get the same result. 
    -- Query optimizer says this is more expensive, but I've seen cases where
    -- it says a query is more expensive but it returns results faster.
    SELECT O1, O2, O3, Property, Value
    FROM (
      SELECT A.O1, A.O2, A.O3, A.Property, A.Value
      FROM TPIntermediate A
      LEFT JOIN TPIntermediate B ON A.O1 = B.O1
        AND A.O2 = B.O2
        AND A.O3 = B.O3
        AND A.Property = B.Property
        AND 
        (
          -- Find any rows with Parent LeftIndex triplet that is greater than this one
          (A.PL1 < B.PL1
          AND A.PL2 < B.PL2
          AND A.PL3 < B.PL3) 
        OR
          -- Find any rows with LeftIndex triplet that is greater than this one
          (A.CL1 < B.CL1
          AND A.CL2 < B.CL2
          AND A.CL3 < B.CL3)
        )
      -- If this row has any rows that match the previous two cases, exclude it
      WHERE B.O1 IS NULL ) AS x
    WHERE property = 'P1' AND value = 'abc'
    -- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
    -- that has been the focus of your question.   
    -- Howeer, it wasn't clear from your question whether this order by was required.
    --ORDER BY O1, O2, O3
    GO
    
    -------- ANSWER 2  QUERIES ----------
    -- Same as above but using an indexed view to partially calculate results
    
    SELECT O1, O2, O3, Property, Value 
    FROM TPIndexedResultView
    WHERE property = 'P1' AND value = 'abc'
    -- Your assertion is that this order by is the most expensive part. 
    -- Sometimes converting queries into views allows the server to
    -- Optimize them better over time.
    -- NOTE: removing this order by has no effect on this query.
    --ORDER BY O1, O2, O3
    GO
    
    SELECT O1, O2, O3, Property, Value
    FROM (
      SELECT A.O1, A.O2, A.O3, A.Property, A.Value
      FROM TPIndexedIntermediate A
      LEFT JOIN TPIndexedIntermediate B ON A.O1 = B.O1
        AND A.O2 = B.O2
        AND A.O3 = B.O3
        AND A.Property = B.Property
        AND 
        (
          -- Find any rows with Parent LeftIndex triplet that is greater than this one
          (A.PL1 < B.PL1
          AND A.PL2 < B.PL2
          AND A.PL3 < B.PL3) 
        OR
          -- Find any rows with LeftIndex triplet that is greater than this one
          (A.CL1 < B.CL1
          AND A.CL2 < B.CL2
          AND A.CL3 < B.CL3)
        )
      -- If this row has any rows that match the previous two cases, exclude it
      WHERE B.O1 IS NULL ) AS x
    WHERE property = 'P1' AND value = 'abc'
    -- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
    -- that has been the focus of your question.   
    -- Howeer, it wasn't clear from your question whether this order by was required.
    --ORDER BY O1, O2, O3
    GO
    
    
    
    -------- ANSWER 3  QUERY ----------
    -- Returning results from a pre-calculated table is fast and easy
    -- Unless your are doing many more inserts than reads, or your result
    -- set is very large, this is a fine way to compensate for a poor design
    -- in one area of your database.
    SELECT Object1 as O1, Object2 as O2, Object3 as O3, Property, Value 
    FROM TPResult
    WHERE property = 'P1' AND value = 'abc'
    ORDER BY O1, O2, O3
    

    关于sql - 有效地通过覆盖处理继承,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11992267/

    相关文章:

    sql - sql脚本第一次运行时检查表是否存在(优势数据架构师)

    sql-server - Mac 上有哪些免费的 SQL Server Management Studio

    asp.net - 过程或函数 ""需要未提供的参数 ""

    sql-server - 更改本地时钟会影响远程 SQL Server 数据库功能

    MySQL 查询在列值上具有不同的值

    mysql - SQL 高级 SELECT 语句

    sql-server - SQL Sum 查询问题

    sql-server - 实现悲观锁

    sql - 如何在 Sql Server 和过滤器记录中对集合进行分组

    java - SQL 不接受 AbsolutePath Java