postgresql - 如何确定流网络有向图上的 Strahler 数

标签 postgresql recursive-query window-functions directed-graph transitive-closure-table

问题/示例/期望值

我需要确定一个 Strahler numberStrahler stream order 用于表示流网络的有向图。我可以导出信息forwards and backwards using WITH RECURSIVE queries ,但似乎我需要做一些不同的事情来确定 Strahler 数。

例如,这里有一个 19 段流网络,有 10 个支流和一个出水口。每个段的上游部分由一个节点 ID 表示。

stream_nodes

和表结构中的相同数据,其中段通过to_node连接,对于流域导出为null。

CREATE TABLE streams (
  node integer PRIMARY KEY,
  to_node integer REFERENCES streams(node),
  expected_order integer
);
INSERT INTO streams(node, to_node, expected_order) VALUES
(1, NULL, 4),
(2, 1, 4),
(3, 2, 3),
(4, 2, 3),
(5, 4, 3),
(6, 3, 2),
(7, 3, 2),
(8, 5, 2),
(9, 5, 2),
(10, 6, 1),
(11, 6, 1),
(12, 7, 1),
(13, 7, 1),
(14, 8, 1),
(15, 8, 1),
(16, 9, 1),
(17, 9, 1),
(18, 4, 1),
(19, 1, 1);

此处显示了 Strahler 数的预期结果 (expected_order):

expected_stream_order

共有三个规则(来自 GRASS 7.0 Manual ):

  1. 如果该节点没有子节点,则其 Strahler 阶数为 1。
  2. 如果该节点有一个且只有一个具有 Strahler 最大阶 i 的支流,并且所有其他支流的阶都小于 i,则该阶保持 i
  3. 如果该节点有两条或多条最大阶i的支流,则该节点的Strahler阶为i + 1。

我发现/尝试过的

从我挖掘解决这个问题的过程中发现,这个计算can be done with SQL (除了我认为他们的“SQL 脚本”是为 MS SQL Server 编写的)。但是,我还没有找到可以用 PostgreSQL 9.1 完成的事情。

我最好的尝试之一是计算每个节点的上游节点的数量,它正确地识别支流(一阶),而不是其他:

WITH RECURSIVE search_graph AS (
  SELECT node AS start_node, node
  FROM streams
  -- Connect downstream towards outlet(s)
  UNION ALL
  SELECT sg.start_node, n.node
  FROM streams n
  JOIN search_graph sg ON n.to_node = sg.node
)
SELECT start_node, count(sg.node) as upstream_nodes, expected_order
FROM search_graph sg
JOIN streams s ON sg.start_node = s.node
GROUP BY start_node, expected_order
ORDER BY upstream_nodes DESC, start_node;

 start_node | upstream_nodes | expected_order
------------+----------------+----------------
          1 |             19 |              4
          2 |             17 |              4
          4 |              9 |              3
          3 |              7 |              3
          5 |              7 |              3
          6 |              3 |              2
          7 |              3 |              2
          8 |              3 |              2
          9 |              3 |              2
         10 |              1 |              1
         11 |              1 |              1
         12 |              1 |              1
         13 |              1 |              1
         14 |              1 |              1
         15 |              1 |              1
         16 |              1 |              1
         17 |              1 |              1
         18 |              1 |              1
         19 |              1 |              1
(19 rows)

一个想法是使用nth_value(value any, nth integer) window functionappropriately set window frame range .但是,我不确定如何设置它,或者是否可以设置它来识别 Strahler 号码。另一个 [不太令人兴奋] 的想法是为每个 Strahler 数手动运行迭代,我希望我的真实世界数据在五到八个订单(迭代)之间。这可以通过 DO statement 来完成.但我们欢迎任何更好的想法。

最佳答案

我遇到了 CTE 的限制。递归 CTE 无法对自身执行 LEFT JOIN。只是在功能上完成了它。

现场测试:https://www.db-fiddle.com/f/8z58LCVhD62YvkeJjriW8d/5

create or replace function strahler(_parent int) returns table(
    node int, strahler_order int
)
as
$$
    select 
        s.node,
        case 
            -- If the node is a leaf (has no children), its Strahler number is one.
            when count(st.*) = 0 then 
                1

            when count(st.*) >= 2 then
                case 
                    -- If the node has one child with Strahler number i, 
                    -- and all other children have Strahler numbers less than i, 
                    -- then the Strahler number of the node is i again.
                    when min(st.strahler_order) < max(st.strahler_order) then
                        max(st.strahler_order)

                    -- If the node has two or more children with Strahler number i, 
                    -- and no children with greater number, 
                    -- then the Strahler number of the node is i + 1.
                    when min(st.strahler_order) = max(st.strahler_order) then
                        max(st.strahler_order) + 1                                          
                end
        end         
    from streams s
    left join lateral strahler(s.node) st  on true
    where _parent = 0 or s.to_node = _parent
    group by s.node
$$ language 'sql';

select st.node, s.expected_order, st.strahler_order
from strahler(0) st 
join streams s on st.node = s.node 
order by st.node;

测试:

select st.node, s.expected_order, st.strahler_order
from strahler(0) st 
join streams s on st.node = s.node 
order by st.node;

输出:

| node | expected_order | strahler_order |
| ---- | -------------- | -------------- |
| 1    | 4              | 4              |
| 2    | 4              | 4              |
| 3    | 3              | 3              |
| 4    | 3              | 3              |
| 5    | 3              | 3              |
| 6    | 2              | 2              |
| 7    | 2              | 2              |
| 8    | 2              | 2              |
| 9    | 2              | 2              |
| 10   | 1              | 1              |
| 11   | 1              | 1              |
| 12   | 1              | 1              |
| 13   | 1              | 1              |
| 14   | 1              | 1              |
| 15   | 1              | 1              |
| 16   | 1              | 1              |
| 17   | 1              | 1              |
| 18   | 1              | 1              |
| 19   | 1              | 1              |

这是最初的计划

现场测试:https://www.db-fiddle.com/f/8z58LCVhD62YvkeJjriW8d/1

with recursive search_graph as (
    select node as start_node, node
    from streams

    union all
    select sg.start_node, n.node
    from streams n
    join search_graph sg on n.to_node = sg.node
)
, get_kids as 
(
    select 
        s.node as kid, 
        count(sg.*) - 1 as kid_kids, 
        s.expected_order
    from streams s 
    join search_graph sg on s.node = sg.start_node 
    group by s.node, s.expected_order
    order by kid_kids
)
, order_safe as 
(
    select 
        row_number() over(s) eo, 

        gk.kid, 
        gk.kid_kids, 

        gk_kid.to_node as parent, 
        gk_p.kid_kids as siblings 
    from get_kids gk
    left join streams gk_kid on gk.kid = gk_kid.node
    left join get_kids gk_p on gk_kid.to_node = gk_p.kid
    window s as (order by gk_p.kid_kids /* siblings */, gk_kid.to_node  /* parent */) 
)    
select * from order_safe;

输出:

| eo  | kid | kid_kids | parent | siblings |
| --- | --- | -------- | ------ | -------- |
| 1   | 11  | 0        | 6      | 2        |
| 2   | 10  | 0        | 6      | 2        |
| 3   | 12  | 0        | 7      | 2        |
| 4   | 13  | 0        | 7      | 2        |
| 5   | 15  | 0        | 8      | 2        |
| 6   | 14  | 0        | 8      | 2        |
| 7   | 17  | 0        | 9      | 2        |
| 8   | 16  | 0        | 9      | 2        |
| 9   | 6   | 2        | 3      | 6        |
| 10  | 7   | 2        | 3      | 6        |
| 11  | 9   | 2        | 5      | 6        |
| 12  | 8   | 2        | 5      | 6        |
| 13  | 5   | 6        | 4      | 8        |
| 14  | 18  | 0        | 4      | 8        |
| 15  | 3   | 6        | 2      | 16       |
| 16  | 4   | 8        | 2      | 16       |
| 17  | 19  | 0        | 1      | 18       |
| 18  | 2   | 16       | 1      | 18       |
| 19  | 1   | 18       |        |          |

最初的计划是以安全顺序评估每个节点(将由 eo 字段促进),从具有较少兄弟节点的节点开始,直到具有许多兄弟节点的节点。然后在每个将被评估的节点上,还将检查其直接子节点(递归 CTE 将对其自身执行 LEFT JOIN),然后执行必要的 Strahler 的三个条件。但是,CTE 有一个局限性,递归 CTE 不能对自身进行 LEFT JOIN。

关于postgresql - 如何确定流网络有向图上的 Strahler 数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29202822/

相关文章:

postgresql - 使用带有存储库的 Spring 4 data jpa 增加或减少计数器值的预期方法是什么?

sql - 如何将具有树结构的表聚合到单个嵌套 JSON 对象?

sql - 加入两个 Hierarchical 查询以形成更大的 Hierarchy

php - 递归类别下的项目计数

postgresql - 使用窗函数进行限制和偏移

sql - 为多个分层组优化 SUM OVER PARTITION BY

postgresql - 避免在 PostgreSQL 8.3.4 中嵌套聚合函数

sql - postgresQL 中子产品的递归列表

node.js - 错误 : sorry, 已经有太多客户端

c# - 前向填充 .NET for Spark