sql - 特定时间窗口内最多重复出现的值

我有以下结构:

table user
user_id | month_year | fruits  
------------------------------
1       | 2021-01    | apple
1       | 2021-01    | melon
1       | 2021-01    | orange
1       | 2021-02    | grape
1       | 2021-02    | orange
1       | 2021-02    | kiwi
1       | 2021-03    | grape
1       | 2021-03    | pear
1       | 2021-03    | banana
1       | 2021-04    | orange
1       | 2021-04    | kiwi
1       | 2021-04    | banana
1       | 2021-05    | grape
1       | 2021-05    | pear
1       | 2021-05    | kiwi

我想要以下结果:

user     | month_year |            fruits            |  two_months_most_freq
-------------------------------------------------------------------------
1        | 2021-01    | apple, melon, orange         | orange
1        | 2021-02    | grape, orange, kiwi          | orange
1        | 2021-03    | grape, pear, banana          | grape
1        | 2021-04    | orange, kiwi, banana         | banana
1        | 2021-05    | grape, pear, kiwi            | kiwi

清算:在最后一列中，我想要最近 2 个月内最常出现的水果，换句话说，就是实际行和上一行中重复次数最多的水果。请注意，第一行应返回橙色，因为当后面的窗口框架不可用时，应使用前面的窗口框架。

在下面的代码中，我获得了整个数据集中出现次数最多的水果。

select * from (
  select user_id, year_month, 
    string_agg(distinct fruit) as fruits
  from user
  group by  user_id, year_month
) join (
  select user_id, fruit
  from user
  group by user_id, fruit
  qualify 1 = row_number() over(partition by user_id order by count(*) desc)
)
using (user_id)

如何在特定时间窗口应用此逻辑？

最佳答案

考虑下面

select user_id, month_year, fruits, 
  if(prev_month_exists, two_months_most_freq, first_value(two_months_most_freq) over next_month) as two_months_most_freq 
from (
  select user_id, month_year, fruits, 
    ( select fruit from unnest(split(two_month_fruits)) fruit
      group by fruit order by count(*) desc limit 1
    ) as two_months_most_freq, 
    month, prev_month_exists
  from (
    select distinct user_id, month_year, month, 
      string_agg(fruit) over(partition by user_id, month_year) fruits,
      string_agg(fruit) over last_two_months as two_month_fruits,
      0 < count(*) over prev_month as prev_month_exists
    from users, unnest([struct(
      12 * extract(year from date(month_year || '-01')) + extract(month from date(month_year || '-01')) as month
    )]) 
    window 
      last_two_months as (partition by user_id order by month range between 1 preceding and current row), 
      prev_month as (partition by user_id order by month range between 1 preceding and 1 preceding)
  )
)
window next_month as (partition by user_id order by month range between 1 following and 1 following)

如果应用于问题中的示例数据 - 输出为

关于sql - 特定时间窗口内最多重复出现的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73616022/

sql - 特定时间窗口内最多重复出现的值

上一篇：swift - 如何将 CompactMap 应用于两个合并的发布者

下一篇：powershell - 带有 for_each 变量的 Terraform template_file 资源