arrays - Clickhouse:计算两个日期之间的差异,不包括某些天(但不包括周末!)

标签 arrays subquery clickhouse

也许这是一个简单的问题,但我不知道如何解决它。 我正在开发 ClickHouse 20.12.5.14。 我的目标是获得两个日期时间之间的差异(以分钟为单位),不包括在“非工作日”表中配置的某些日期。请注意,我对仅排除星期日和星期六不感兴趣:我应该能够仅从计算中排除特定日期。

这些是我的表格(文末的定义):

select * 
from test_orders_workflow 
order by order_no, phase_no;

order_no|phase_no|phase_descr           |phase_date         |
--------|--------|----------------------|-------------------|
O_1342  |      10|Order placed          |2021-01-04 10:20:00|
O_1342  |      20|Payment processing    |2021-01-06 10:00:00|
O_1342  |      30|Order Fulfillment     |2021-01-08 11:00:00|
O_1342  |      40|Shipping and Delivery |2021-01-14 13:30:00|
O_6543  |      10|Order placed          |2021-02-03 15:00:00|
O_6543  |      20|Payment processing    |2021-02-03 17:30:00|
O_6543  |      25|Payment refused       |2021-02-03 17:33:00|
O_7836  |      10|Order placed          |2021-01-04 10:30:00|
O_7836  |      15|Order Cancelled       |2021-01-10 16:00:00|

select * from test_orders_nwd; 

not_w_day |
----------|
2021-01-01|
2021-01-07|
2021-02-01|

当然,只获取增量非常简单:

with t as (
select order_no, 
    arraySort(groupArray(phase_no)) phases_no, 
    arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr, 
    arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end, 
    arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin, 
    arrayMap((x,y) -> trunc((x-y)/60), phases_end, phases_begin) phases_duration
    from test_orders_workflow
    group by order_no
)
select
    order_no,
    phases_no as phase_no,
    phases_descr as phase_descr,
    phases_begin as phase_begin,
    phases_end as phase_end,
    phases_duration as minutes 
from
    t 
array join phases_no,
    phases_descr,
    phases_begin,
    phases_end,
    phases_duration
order by
    order_no,
    phases_no;

order_no|phase_no|phase_descr           |phase_begin        |phase_end          |minutes|
--------|--------|----------------------|-------------------|-------------------|-------|
O_1342  |      10|Order placed          |2021-01-04 10:20:00|2021-01-04 10:20:00|    0.0|
O_1342  |      20|Payment processing    |2021-01-04 10:20:00|2021-01-06 10:00:00| 2860.0|
O_1342  |      30|Order Fulfillment     |2021-01-06 10:00:00|2021-01-08 11:00:00| 2940.0|
O_1342  |      40|Shipping and Delivery |2021-01-08 11:00:00|2021-01-14 13:30:00| 8790.0|
O_6543  |      10|Order placed          |2021-02-03 15:00:00|2021-02-03 15:00:00|    0.0|
O_6543  |      20|Payment processing    |2021-02-03 15:00:00|2021-02-03 17:30:00|  150.0|
O_6543  |      25|Payment refused       |2021-02-03 17:30:00|2021-02-03 17:33:00|    3.0|
O_7836  |      10|Order placed          |2021-01-04 10:30:00|2021-01-04 10:30:00|    0.0|
O_7836  |      15|Order Cancelled       |2021-01-04 10:30:00|2021-01-10 16:00:00| 8970.0|

但我不知道如何从差异中排除存储在非工作日定义中的日期。我想要得到的是这样的(查看最后两列中的第三行和最后一行):

order_no|phase_no|phase_descr           |phase_begin        |phase_end          |minutes|working_minutes|
--------|--------|----------------------|-------------------|-------------------|-------|---------------|
O_1342  |      10|Order placed          |2021-01-04 10:20:00|2021-01-04 10:20:00|    0.0|            0.0|
O_1342  |      20|Payment processing    |2021-01-04 10:20:00|2021-01-06 10:00:00| 2860.0|         2860.0|
O_1342  |      30|Order Fulfillment     |2021-01-06 10:00:00|2021-01-08 11:00:00| 2940.0|         1500.0|
O_1342  |      40|Shipping and Delivery |2021-01-08 11:00:00|2021-01-14 13:30:00| 8790.0|         8790.0|
O_6543  |      10|Order placed          |2021-02-03 15:00:00|2021-02-03 15:00:00|    0.0|            0.0|
O_6543  |      20|Payment processing    |2021-02-03 15:00:00|2021-02-03 17:30:00|  150.0|          150.0|
O_6543  |      25|Payment refused       |2021-02-03 17:30:00|2021-02-03 17:33:00|    3.0|            3.0|
O_7836  |      10|Order placed          |2021-01-04 10:30:00|2021-01-04 10:30:00|    0.0|            0.0|
O_7836  |      15|Order Cancelled       |2021-01-04 10:30:00|2021-01-10 16:00:00| 8970.0|         7530.0|

我的方法是在查询中包含非工作日的计数:

...
count(*)*24*60 from test_orders_nwd where not_w_day between the first and the second date
...

但它没有成功,因为 Clickhouse 不允许您包含这样的子查询(没有连接),无论是在标准查询中,还是使用数组。例如,以下内容为您提供了一个异常(exception):

select order_no, phase_no, phase_descr, phase_begin, phase_end, minutes, 
minutes - (select count(*) 
            from test_orders_nwd tt
            where tt.not_w_day between phase_begin and phase_end
          )
from (
with t as (
select order_no, 
    arraySort(groupArray(phase_no)) phases_no, 
    arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr, 
    arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end, 
    arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin, 
    arrayMap((x,y) -> trunc((x-y)/60), phases_end, phases_begin) phases_duration
    from test_orders_workflow
    group by order_no
)
select
    order_no,
    phases_no as phase_no,
    phases_descr as phase_descr,
    phases_begin as phase_begin,
    phases_end as phase_end,
    phases_duration as minutes
     
from
    t 
array join phases_no,
    phases_descr,
    phases_begin,
    phases_end,
    phases_duration
order by
    order_no,
    phases_no
);

--> 
ClickHouse exception, code: 47, host: 10.0.1.137, port: 8123; Code: 47, e.displayText() = DB::Exception: Missing columns: 'phase_end' 'phase_begin' while processing query: 'SELECT count() FROM test_orders_nwd AS tt WHERE (not_w_day >= phase_begin) AND (not_w_day <= phase_end)', required columns: 'not_w_day' 'phase_begin' 'phase_end', source columns: 'not_w_day': While processing (SELECT count(*) FROM test_orders_nwd AS tt WHERE (tt.not_w_day >= phase_begin) AND (tt.not_w_day <= phase_end)) AS _subquery13741024: While processing minutes - ((SELECT count(*) FROM test_orders_nwd AS tt WHERE (tt.not_w_day >= phase_begin) AND (tt.not_w_day <= phase_end)) AS _subquery13741024) (version 20.12.5.14 (official build))

从 arrayMap 函数引用外部表时也是如此。

顺便说一句,使用静态数组效果很好,但我认为无法将数组中的日期与 lambda 函数中的“x”和“y”进行比较:

select order_no, phase_no, phase_descr, phase_begin, phase_end, minutes, minutes_wd
from (
with t as (
with ['2021-01-04','2021-01-05'] as excluded_days
select order_no, 
    arraySort(groupArray(phase_no)) phases_no, 
    arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr, 
    arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end, 
    arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin, 
    arrayMap((x,y) -> trunc((x-y)/60), phases_end, phases_begin) phases_duration,
    arrayMap((x,y) -> trunc((x-y - length(excluded_days)*24*60)/60) , phases_end, phases_begin) phases_duration_wd
    from test_orders_workflow
    group by order_no
)
select
    order_no,
    phases_no as phase_no,
    phases_descr as phase_descr,
    phases_begin as phase_begin,
    phases_end as phase_end,
    phases_duration as minutes,
    phases_duration_wd as minutes_wd
from
    t 
array join phases_no,
    phases_descr,
    phases_begin,
    phases_end,
    phases_duration,
    phases_duration_wd
order by
    order_no,
    phases_no
);

我该怎么办?

如果您能帮助我,在此先感谢您。 如果您想尝试,您可以在这里找到定义:

create table test_orders_workflow 
(order_no String, phase_no Int8, phase_descr String, phase_date Datetime) engine = Log;  

insert into test_orders_workflow values 
('O_1342',10,'Order placed ',toDateTime('2021-01-04 10:20:00')),
('O_1342',20,'Payment processing',toDateTime('2021-01-06 10:00:00')),
('O_1342',30,'Order Fulfillment ',toDateTime('2021-01-08 11:00:00')),
('O_1342',40,'Shipping and Delivery ',toDateTime('2021-01-14 13:30:00')),
('O_7836',10,'Order placed ',toDateTime('2021-01-04 10:30:00')),
('O_7836',15,'Order Cancelled ',toDateTime('2021-01-10 16:00:00')),
('O_6543',10,'Order placed ',toDateTime('2021-02-03 15:00:00')),
('O_6543',20,'Payment processing',toDateTime('2021-02-03 17:30:00')),
('O_6543',25,'Payment refused',toDateTime('2021-02-03 17:33:00')); 

create table test_orders_nwd (not_w_day date) engine = Log; 

insert into test_orders_nwd values 
('2021-01-01'),
('2021-01-07'), 
('2021-02-01'); 

最佳答案

with t as (
select 
    (select groupArray(not_w_day) from test_orders_nwd) as gnot_w_day,
    order_no, 
    arraySort(groupArray(phase_no)) phases_no, 
    arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr, 
    arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end, 
    arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin, 
    arrayMap((x,y) -> (trunc((x-y)/60), trunc((x-y)/60) - 24*60*length(arrayFilter(z -> z between y and x, gnot_w_day))), phases_end, phases_begin) phases_duration
    from test_orders_workflow
    group by order_no
)
select
    order_no,
    phase_no,
    phase_descr,
    phase_begin,
    phase_end,
    t_minutes.1 minutes,
    t_minutes.2 working_minutes 
from
    t 
array join phases_no as phase_no,
    phases_descr as phase_descr,
    phases_begin as phase_begin,
    phases_end as phase_end,
    phases_duration as t_minutes
order by
    order_no,
    phases_no;

┌─order_no─┬─phase_no─┬─phase_descr────────────┬─────────phase_begin─┬───────────phase_end─┬─minutes─┬─working_minutes─┐
│ O_1342   │       10 │ Order placed           │ 2021-01-04 10:20:00 │ 2021-01-04 10:20:00 │       0 │               0 │
│ O_1342   │       20 │ Payment processing     │ 2021-01-04 10:20:00 │ 2021-01-06 10:00:00 │    2860 │            2860 │
│ O_1342   │       30 │ Order Fulfillment      │ 2021-01-06 10:00:00 │ 2021-01-08 11:00:00 │    2940 │            1500 │
│ O_1342   │       40 │ Shipping and Delivery  │ 2021-01-08 11:00:00 │ 2021-01-14 13:30:00 │    8790 │            8790 │
│ O_6543   │       10 │ Order placed           │ 2021-02-03 15:00:00 │ 2021-02-03 15:00:00 │       0 │               0 │
│ O_6543   │       20 │ Payment processing     │ 2021-02-03 15:00:00 │ 2021-02-03 17:30:00 │     150 │             150 │
│ O_6543   │       25 │ Payment refused        │ 2021-02-03 17:30:00 │ 2021-02-03 17:33:00 │       3 │               3 │
│ O_7836   │       10 │ Order placed           │ 2021-01-04 10:30:00 │ 2021-01-04 10:30:00 │       0 │               0 │
│ O_7836   │       15 │ Order Cancelled        │ 2021-01-04 10:30:00 │ 2021-01-10 16:00:00 │    8970 │            7530 │
└──────────┴──────────┴────────────────────────┴─────────────────────┴─────────────────────┴─────────┴─────────────────┘

关于arrays - Clickhouse:计算两个日期之间的差异,不包括某些天(但不包括周末!),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66768089/

相关文章:

clickhouse - 表删除后回收磁盘空间

java - 我怎样才能得到整数数组中最大的产品?

java - 以某种方式打印出数组

sql - 如何在 select 子句中使用 join in from 子句(如 SQL Server)执行 Postgresql 子查询?

sql - 了解基本的 SQL 查询

java - 在准备好的语句中将负 java long 转换为 clickhouse Uint64

python - 在Python中使用数组以更方便的方式编写长switch语句

php - 如何使用unserialize将序列化数据检索到php中的表中

nhibernate - 如何使用 NHibernate QueryOver 重新创建这个复杂的 SQL 查询?

clickhouse - 是否有任何函数(将元组更改为数组)或(按键求和数组)?