我在 Snowflake 中有两个表,它们定义了一堆图表。它们看起来像这样:
节点表:
node family <other descriptive columns>
A 1
B 1
C 1
D 1
E 1
F 1
G 1
H 1
I 2
J 2
etc...
边缘表:
to from
A B
B C
B D
D E
F G
G H
etc...
族描述了一个或多个相关图的集合;其中一些可能是单个图,其他可能是几个小图等。这些图是单向的并且可能有多个分支/ fork 。目前这些图都没有标签(也就是说,需要从连接中推断出来)。
我想创建一个包含图形列表的表格,例如:
family id structure
1 1 [A -> B -> C, B -> D -> E]
1 2 [F -> G -> H]
2 ... etc...
其中每一行代表一组连接的节点,该组的结构和 id 是图形对象的新标签。
我使用递归 CTE 取得了一些进展,但遇到了困难。非常感谢任何帮助!
最佳答案
首先我们需要构建递归。
WITH nodes(node, family) AS (
SELECT *
FROM VALUES
('A',1),
('B',1),
('C',1),
('D',1),
('E',1),
('F',1),
('G',1),
('H',1),
('I',2),
('J',2)
), edges(_from, _to) AS (
SELECT *
FROM VALUES
('A','B'),
('B','C'),
('B','D'),
('D','E'),
('F','G'),
('G','H')
), recursive_wrapper AS (
WITH recur AS (
-- Anchor Clause
select
family,
node,
node as tail,
array_construct(node) path,
1 as depth
from nodes
union all
-- Recursive Clause
select r.family,
r.node,
e._to as tail,
ARRAY_APPEND(r.path, e._to) as path,
r.depth + 1 as depth
from recur AS r
join edges AS e
on r.tail = e._from
)
SELECT *
FROM recur
)
SELECT *
FROM recursive_wrapper
WHERE node = 'A'
给出:
然后我们需要修剪升华的行(在上面我们想要删除深度 1,2,4,因为它们是另一行的子部分),我们可以通过将 path
从数组交换为字符串来做到这一点
), recursive_wrapper AS (
WITH recur AS (
-- Anchor Clause
select
family,
node,
node as tail,
node::text path, /* this cast it to max size, verse the largest size of the input */
1 as depth
from nodes
union all
-- Recursive Clause
select r.family,
r.node,
e._to as tail,
r.path || '_' || e._to as path,
r.depth + 1 as depth
from recur AS r
join edges AS e
on r.tail = e._from
)
SELECT *
FROM recur
)
SELECT a.*
,b.path as b_path
,b.depth as b_depth
,startswith(b.path, a.path) as starts
,sum(iff(starts,1,0)) over(partition by a.family, a.node, a.path) as dominated_count
FROM recursive_wrapper as a
LEFT JOIN recursive_wrapper as b
ON a.family = b.family AND a.node = b.node AND a.depth < b.depth
WHERE a.node = 'A'
ORDER BY 1,2,4;
给出:
所以我们只想保留那些计数为零的内容,我们可以在 QUALIFY 中做到这一点
SELECT a.*
--,b.path as b_path
--,b.depth as b_depth
--,startswith(b.path, a.path) as starts
--,sum(iff(starts,1,0)) over(partition by a.family, a.node, a.path) as dominated_count
FROM recursive_wrapper as a
LEFT JOIN recursive_wrapper as b
ON a.family = b.family AND a.node = b.node AND a.depth < b.depth
WHERE a.node = 'A'
QUALIFY sum(iff(startswith(b.path, a.path),1,0)) over(partition by a.family, a.node, a.path) = 0
ORDER BY 1,2,4;
给出:
这会引发循环,我们可以通过将数组放回原处并检查数组中已有的值来防止循环
WITH nodes(node, family) AS (
SELECT *
FROM VALUES
('A',1),
('B',1),
('C',1),
('D',1),
('E',1),
('F',1),
('G',1),
('H',1),
('I',2),
('J',2)
), edges(_from, _to) AS (
SELECT *
FROM VALUES
('A','B'),
('B','C'),
('B','D'),
('D','E'),
('F','G'),
('G','H')
), recursive_wrapper AS (
WITH recur AS (
-- Anchor Clause
select
family,
node,
node as tail,
node::text path, /* this cast it to max size, verse the largest size of the input */
array_construct(node) a_path,
1 as depth
from nodes
union all
-- Recursive Clause
select r.family,
r.node,
e._to as tail,
r.path || '_' || e._to as path,
ARRAY_APPEND(r.a_path, e._to) as a_path,
r.depth + 1 as depth
from recur AS r
join edges AS e
on r.tail = e._from AND ARRAY_CONTAINS( e._to::variant , r.a_path ) = FALSE
)
SELECT *
FROM recur
)
SELECT a.*
FROM recursive_wrapper as a
LEFT JOIN recursive_wrapper as b
ON a.family = b.family AND a.node = b.node AND a.depth < b.depth
QUALIFY sum(iff(startswith(b.path, a.path),1,0)) over(partition by a.family, a.node, a.path) = 0
ORDER BY 1,2,4;
关于sql - 将边描述表转换为包含树的数组列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71206840/