mysql - 条件窗函数

我有一个如下所示的销售表:

store_id    cust_id   txn_id   txn_date   amt       industry
200         1         1        20180101   21.01     1000   
200         2         2        20200102   20.01     1000
200         2         3        20200103   19        1000
200         3         4        20180103   19        1000
200         4         5        20200103   21.01     1000
300         2         6        20200104   1.39      2000
300         1         7        20200105   12.24     2000
300         1         8        20200105   25.02     2000
400         2         9        20180106   103.1     1000
400         2         10       20200107   21.3      1000

以下是生成此示例表的代码:

CREATE TABLE sales(
    store_id INT,
    cust_id INT,
    txn_id INT,
    txn_date bigint,
    amt float,
    industry INT);

INSERT INTO sales VALUES(200,1,1,20180101,21.01,1000);
INSERT INTO sales VALUES(200,2,2,20200102,20.01,1000);
INSERT INTO sales VALUES(200,2,3,20200103,19.00,1000);
INSERT INTO sales VALUES(200,3,4,20180103,19.00,1000);
INSERT INTO sales VALUES(200,4,5,20200103,21.01,1000);
INSERT INTO sales VALUES(300,2,6,20200104,1.39,2000);
INSERT INTO sales VALUES(300,1,7,20200105,12.24,2000);
INSERT INTO sales VALUES(300,1,8,20200105,25.02,2000);
INSERT INTO sales VALUES(400,2,9,20180106,103.1,1000);
INSERT INTO sales VALUES(400,2,10,20200107,21.3,1000);

我想做的是创建一个新表，结果来回答以下问题:自 2020 年 1 月 3 日以来，我的 VIP 客户中有多少百分比 i) 仅在我的商店购物； ii) 在我的商店和同行业的其他商店； iii) 仅在同一行业的其他商店？将 VIP 客户定义为自 2019 年以来至少在指定商店购物过一次的人。

这是目标输出表:

store    industry   pct_my_store_only   pct_both   pct_other_stores_only
200      1000         0.5               0.5         0.0
300      2000         0.5               0.5         0.0
400      1000         0.0               1.0         0.0

我正在尝试使用窗口函数来完成此任务。这是我到目前为止所拥有的:

CREATE TABLE results as
    SELECT s.store_id, s.industry,
    COUNT(DISTINCT (CASE WHEN s.txn_date>=20200103 THEN s.cust_id END)) * 1.0 / sum(count(DISTINCT (CASE WHEN s.txn_date>=20200103 THEN s.cust_id END))) OVER (PARTITION BY s.industry) AS pct_my_store_only
    ...AS pct_both
    ...AS pct_other_stores_only
    FROM sales s
    WHERE sales.txn_date>=20190101 
    GROUP BY s.store_id, s.industry;

上面的说法似乎不正确；我该如何纠正这个问题？

最佳答案

将每个客户的不同 store_id 和行业连接到串联的不同 store_id 和行业，然后使用窗口函数 avg() 和函数 find_in_set() 来确定是否一位顾客有多少顾客在每家商店购物过或没有购物:

with 
  stores as (
    select distinct store_id, industry
    from sales
    where txn_date >= 20190103
  ),
  customers as (
    select cust_id, 
           group_concat(distinct store_id) stores,
           group_concat(distinct industry) industries 
    from sales
    where txn_date >= 20190103
    group by cust_id
 ),
  cte as (
  select *,
    avg(concat(s.store_id) = concat(c.stores)) over (partition by s.store_id, s.industry) pct_my_store_only,
    avg(find_in_set(s.store_id, c.stores) = 0) over (partition by s.industry) pct_other_stores_only
    from stores s inner join customers c 
    on find_in_set(s.industry, c.industries) and find_in_set(s.store_id, c.stores)
  )  
select distinct store_id, industry, 
       pct_my_store_only,
       1 - pct_my_store_only - pct_other_stores_only pct_both,   
       pct_other_stores_only
from cte       
order by store_id, industry

请参阅demo .
结果:

> store_id | industry | pct_my_store_only | pct_both | pct_other_stores_only
> -------: | -------: | ----------------: | -------: | --------------------:
>      200 |     1000 |            0.5000 |   0.5000 |                0.0000
>      300 |     2000 |            0.5000 |   0.5000 |                0.0000
>      400 |     1000 |            0.0000 |   1.0000 |                0.0000

关于mysql - 条件窗函数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63038512/

mysql - 条件窗函数

上一篇：MongoDB 用户通知方案建议

下一篇：excel - 将默认打印机设置为 Microsoft Print to PDF