Postgresql - 如果另一列对特定数据范围为正,如何创建列

标签 postgresql

我是 PostgreSQL 的新手,每天都在学习。我正在使用 PostgreSQL 9.4。

我有每日数据,如果另一个变量(此处为最小流量)连续 至少 5 天为正,我想创建一个值为 1 的二元变量。

数据具有以下结构(“test”是我想创建的变量):

Group_id    |    date        |    min_flow   |   test
------------+----------------+----------------------------
1          |  2012-02-01    | 0             |  0
1          |  2012-02-02    | 0             |  0
1          |  2012-02-03    | 1.5           |  1
1          |  2012-02-04    | 1             |  1
1          |  2012-02-05    | 0.7           |  1
1          |  2012-02-06    | 0.8           |  1
1          |  2012-02-07    | 1.2           |  1
1          |  2012-02-08    | 1.5           |  1
1          |  2012-02-09    | 0             |  0
1          |  2012-02-10    | 0             |  0
1          |  2012-02-11    | 0.9           |  0
1          |  2012-02-12    | 1.2           |  0
1          |  2012-02-13    | 0             |  0
1          |  2012-02-14    | 0             |  0

我是通过使用窗口函数实现的,如下所示:

SELECT CASE WHEN min_flow > 0                                
    AND                                        
    (
    -- current row + next 4 rows have a min(flow) > 0
    ((lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0     
    AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lead(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lead(min_flow, 4) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row – previous 4 rows have a min(flow) > 0
    (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lag(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 4) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row + next 3 rows – previous 1 row have a min(flow) > 0
    (lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lead(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row + next 2 rows –previous 2 rows  have a min(flow) > 0
    (lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row + next 1 row – previous 3 rows  have a min(flow) > 0
    (lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    )
    THEN 1 ELSE 0 END AS test
FROM table

但是,我想知道是否有更好/更有效的方法?

如有任何帮助,我们将不胜感激!

提前致谢!

最佳答案

使用 row_number() 对所有行和按正/非正 min_flow 划分的差异来确定连续行的组:

select 
    group_id,
    _date_,
    min_flow,
    (count(*) over w_diff > 4)::int test
from (
    select *, 
        row_number() over w_all rn_all, 
        row_number() over w_pos rn_pos
    from a_table
    window 
        w_all as (order by _date_),
        w_pos as (partition by min_flow > 0 order by _date_)
    ) s
    window w_diff as (partition by rn_all- rn_pos)
    order by _date_;

查询说明方法:

select 
    *,
    rn_all- rn_pos diff,
    (count(*) over w_diff > 4)::int test
from (
    select *, 
        row_number() over w_all rn_all, 
        row_number() over w_pos rn_pos
    from a_table
    window 
        w_all as (order by _date_),
        w_pos as (partition by min_flow > 0 order by _date_)
    ) s
    window w_diff as (partition by rn_all- rn_pos)
    order by _date_;

 group_id |   _date_   | min_flow | rn_all | rn_pos | diff | test 
----------+------------+----------+--------+--------+------+------
        1 | 2012-02-01 |        0 |      1 |      1 |    0 |    0
        1 | 2012-02-02 |        0 |      2 |      2 |    0 |    0
        1 | 2012-02-03 |      1.5 |      3 |      1 |    2 |    1
        1 | 2012-02-04 |        1 |      4 |      2 |    2 |    1
        1 | 2012-02-05 |      0.7 |      5 |      3 |    2 |    1
        1 | 2012-02-06 |      0.8 |      6 |      4 |    2 |    1
        1 | 2012-02-07 |      1.2 |      7 |      5 |    2 |    1
        1 | 2012-02-08 |      1.5 |      8 |      6 |    2 |    1
        1 | 2012-02-09 |        0 |      9 |      3 |    6 |    0
        1 | 2012-02-10 |        0 |     10 |      4 |    6 |    0
        1 | 2012-02-11 |      0.9 |     11 |      7 |    4 |    0
        1 | 2012-02-12 |      1.2 |     12 |      8 |    4 |    0
        1 | 2012-02-13 |        0 |     13 |      5 |    8 |    0
        1 | 2012-02-14 |        0 |     14 |      6 |    8 |    0
(14 rows)

关于Postgresql - 如果另一列对特定数据范围为正,如何创建列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33946196/

相关文章:

sql - PostgreSQL:检查数组中的任何项目是否存在于 db 属性中,该属性也是数组

SQL - OR 条件的顺序重要吗?

java -postgresql 最后插入的 id 插入时未获取

mysql - 是我在 3NF 中的下表。即使我有如下重复,我如何保持 3NF?

postgresql - Elixir Postgres 运行 mix ecto.setup 时出现错误 42501?

sql - 对每页限制的分组查询结果进行分页

arrays - 连接 postgresql json 数组中对象的字符串值

postgresql - 对 tsvector 中的每个元素使用 Levenshtein 函数?

postgresql - 使用 FreeSWITCH 的本地 PostgreSQL 支持编写 Lua 脚本?

python - 将数据从 bigquery 转储到 postgresql