sql - 将边描述表转换为包含树的数组列表

标签 sql snowflake-cloud-data-platform graph-theory

我在 Snowflake 中有两个表,它们定义了一堆图表。它们看起来像这样:

节点表:

node    family   <other descriptive columns>  
A       1
B       1
C       1
D       1
E       1
F       1
G       1
H       1
I       2
J       2

etc...

边缘表:

to   from
A        B
B        C
B        D
D        E
F        G
G        H

etc...

族描述了一个或多个相关图的集合;其中一些可能是单个图,其他可能是几个小图等。这些图是单向的并且可能有多个分支/ fork 。目前这些图都没有标签(也就是说,需要从连接中推断出来)。

我想创建一个包含图形列表的表格,例如:

family   id    structure
1        1     [A -> B -> C, B -> D -> E]
1        2     [F -> G -> H] 
2 ... etc...

其中每一行代表一组连接的节点,该组的结构和 id 是图形对象的新标签。

我使用递归 CTE 取得了一些进展,但遇到了困难。非常感谢任何帮助!

最佳答案

首先我们需要构建递归。

WITH nodes(node, family) AS (
    SELECT * 
    FROM VALUES
        ('A',1),
        ('B',1),
        ('C',1),
        ('D',1),
        ('E',1),
        ('F',1),
        ('G',1),
        ('H',1),
        ('I',2),
        ('J',2)
), edges(_from, _to) AS (
    SELECT * 
    FROM VALUES
        ('A','B'),
        ('B','C'),
        ('B','D'),
        ('D','E'),
        ('F','G'),
        ('G','H')
), recursive_wrapper AS (
    WITH recur AS (
        -- Anchor Clause
        select
            family, 
            node, 
            node as tail, 
            array_construct(node) path, 
            1 as depth
        from nodes
        
        union all

        -- Recursive Clause
        select r.family,
            r.node,
            e._to as tail,
            ARRAY_APPEND(r.path, e._to) as path,
            r.depth + 1 as depth 
        from recur AS r
        join edges AS e
            on r.tail = e._from
    )
    SELECT * 
    FROM recur
)
SELECT * 
FROM recursive_wrapper
WHERE node = 'A'

给出:

<表类=“s-表”> <标题> 家庭 节点 尾部 路径 深度 <正文> 1 一个 一个 [“A”] 1 1 一个 B [“A”,“B”] 2 1 一个 C [“A”,“B”,“C”] 3 1 一个 D [“A”,“B”,“D”] 3 1 一个 E [“A”、“B”、“D”、“E”] 4

然后我们需要修剪升华的行(在上面我们想要删除深度 1,2,4,因为它们是另一行的子部分),我们可以通过将 path 从数组交换为字符串来做到这一点

), recursive_wrapper AS (
    WITH recur AS (
        -- Anchor Clause
        select
            family, 
            node, 
            node as tail, 
            node::text path, /* this cast it to max size, verse the largest size of the input */
            1 as depth
        from nodes
        
        union all

        -- Recursive Clause
        select r.family,
            r.node,
            e._to as tail,
            r.path || '_' || e._to as path,
            r.depth + 1 as depth 
        from recur AS r
        join edges AS e
            on r.tail = e._from
    )
    SELECT * 
    FROM recur
)
SELECT a.*
    ,b.path as b_path
    ,b.depth as b_depth
    ,startswith(b.path, a.path) as starts
    ,sum(iff(starts,1,0)) over(partition by a.family, a.node, a.path) as dominated_count
FROM recursive_wrapper as a
LEFT JOIN recursive_wrapper as b 
    ON a.family = b.family AND a.node = b.node AND a.depth < b.depth
WHERE a.node = 'A'
ORDER BY 1,2,4;

给出:

<表类=“s-表”> <标题> 家庭 节点 尾部 路径 深度 B_PATH B_DEPTH 开始 DOMINATED_COUNT <正文> 1 一个 一个 一个 1 A_B 2 正确 4 1 一个 一个 一个 1 A_B_C 3 正确 4 1 一个 一个 一个 1 A_B_D 3 正确 4 1 一个 一个 一个 1 A_B_D_E 4 正确 4 1 一个 B A_B 2 A_B_C 3 正确 3 1 一个 B A_B 2 A_B_D 3 正确 3 1 一个 B A_B 2 A_B_D_E 4 正确 3 1 一个 C A_B_C 3 A_B_D_E 4 错误 0 1 一个 D A_B_D 3 A_B_D_E 4 正确 1 1 一个 E A_B_D_E 4 0

所以我们只想保留那些计数为零的内容,我们可以在 QUALIFY 中做到这一点

SELECT a.*
    --,b.path as b_path
    --,b.depth as b_depth
    --,startswith(b.path, a.path) as starts
    --,sum(iff(starts,1,0)) over(partition by a.family, a.node, a.path) as dominated_count

FROM recursive_wrapper as a
LEFT JOIN recursive_wrapper as b 
    ON a.family = b.family AND a.node = b.node AND a.depth < b.depth
WHERE a.node = 'A'
QUALIFY sum(iff(startswith(b.path, a.path),1,0)) over(partition by a.family, a.node, a.path) = 0
ORDER BY 1,2,4;

给出:

<表类=“s-表”> <标题> 家庭 节点 尾部 路径 深度 <正文> 1 一个 C A_B_C 3 1 一个 E A_B_D_E 4

这会引发循环,我们可以通过将数组放回原处并检查数组中已有的值来防止循环

WITH nodes(node, family) AS (
    SELECT * 
    FROM VALUES
        ('A',1),
        ('B',1),
        ('C',1),
        ('D',1),
        ('E',1),
        ('F',1),
        ('G',1),
        ('H',1),
        ('I',2),
        ('J',2)
), edges(_from, _to) AS (
    SELECT * 
    FROM VALUES
        ('A','B'),
        ('B','C'),
        ('B','D'),
        ('D','E'),
        ('F','G'),
        ('G','H')
), recursive_wrapper AS (
    WITH recur AS (
        -- Anchor Clause
        select
            family, 
            node, 
            node as tail, 
            node::text path, /* this cast it to max size, verse the largest size of the input */
            array_construct(node) a_path, 
            1 as depth
        from nodes
        
        union all

        -- Recursive Clause
        select r.family,
            r.node,
            e._to as tail,
            r.path || '_' || e._to as path,
            ARRAY_APPEND(r.a_path, e._to) as a_path,
            r.depth + 1 as depth 
        from recur AS r
        join edges AS e
            on r.tail = e._from AND ARRAY_CONTAINS( e._to::variant , r.a_path ) = FALSE
    )
    SELECT * 
    FROM recur
)
SELECT a.*
FROM recursive_wrapper as a
LEFT JOIN recursive_wrapper as b 
    ON a.family = b.family AND a.node = b.node AND a.depth < b.depth
QUALIFY sum(iff(startswith(b.path, a.path),1,0)) over(partition by a.family, a.node, a.path) = 0
ORDER BY 1,2,4;

关于sql - 将边描述表转换为包含树的数组列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71206840/

相关文章:

image-processing - 识别噪声图像中的拓扑图

python - 从 sql 模式文件中提取 Table Create 语句

sql - 雪花: SELECT "COLUMN" with double quotes

MySQL 查询优化 : how to optimize voting calculations?

django 到雪花连接并运行 ORM 查询

sql - 从雪花中的键值数组构建具有动态列的表

python - Networkx 寻找有向图的社区

algorithm - 确定给定的加权图是否具有唯一的 MST

SQL Server 不带 INTO 插入

SQL 构建搜索查询以匹配多个 id