mysql - 滑动,可变 "window",行密度最高

标签 mysql sql

我试图从表中检索记录频率/密度最高的时间段。

假设我有一个像这样的日志表:

datetime   | action | username | highest_time_slot
--------------------------------------------------
2013-09-30 | update | username | 
2013-12-15 | update | username |
2014-03-01 | update | username | *
2014-03-02 | update | username | *
2014-03-03 | update | username | *
2014-03-05 | update | username | *
2015-05-20 | update | username |

从该表中可以看出,2014-03-01到2014-03-05这段时间,用户的操作频率较高。 有没有什么巧妙的方法来检索这个时间段? 感谢您的帮助!

最佳答案

让我们从表定义和一些 INSERT 语句开始。这反射(reflect)了您更改问题之前的数据。

create table log_test (
  datetime date not null,
  action varchar(15) not null,
  username varchar(15) not null,
  primary key (datetime, action, username)
);

insert into log_test values
('2013-09-30', 'update', 'username'),
('2013-12-15', 'update', 'username'),
('2014-03-01', 'update', 'username'),
('2014-03-02', 'update', 'username'),
('2014-03-03', 'update', 'username'),
('2014-03-05', 'update', 'username'),
('2015-05-20', 'update', 'username');

现在我们构建一个整数表。这种表有很多用处;我的有几百万行。 (有一些方法可以自动化插入语句。)

create table integers (
  n integer not null,
  primary key n
);
insert into n values 
 (0),  (1),  (2),  (3),  (4),  (5),  (6),  (7),  (8),  (9),
(10), (11), (12), (13), (14), (15), (16), (17), (18), (19),
(20), (21), (22), (23), (24), (25), (26), (27), (28), (29),
(30), (31), (32), (33), (34), (35), (36), (37), (38), (39),
(40), (41), (42), (43), (44), (45), (46), (47), (48), (49);

该语句为我们提供了 log_test 的日期,以及我们想要查看的“窗口”中的天数。您需要选择不同的,因为可能有多个用户具有相同的日期。

select distinct datetime, t.n
from log_test
cross join (select n from integers where n between 10 and 40) t
order by datetime, t.n;
datetime     n
--
2013-09-30   10
2013-09-30   11
2013-09-30   12
...
2015-05-20   39
2015-05-20   40

We can use that result as a derived table, and do date arithmetic on it.

select datetime period_start, datetime + interval t2.n day period_end
from (
  select distinct datetime, t.n
  from log_test
  cross join (select n from integers where n between 10 and 40) t ) t2
order by period_start, period_end;
period_start  period_end
--
2013-09-30    2013-10-10
2013-09-30    2013-10-11
2013-09-30    2013-10-12
...
2015-05-20    2015-06-28
2015-05-20    2015-06-29

These intervals are off by one; 2013-09-30 to 2013-10-10 has 11 days. I'll leave that repair up to you.

The next version counts the number of "happenings" in each period. In your case, as the question was originally written, we just need to count the number of rows in each period.

select username, t3.period_start, t3.period_end, count(datetime) num_rows
from log_test
inner join (
  select datetime period_start, datetime + interval t2.n day period_end
  from (
    select distinct datetime, t.n
    from log_test
    cross join (select n from integers where n between 10 and 40) t ) t2
  order by period_start, period_end ) t3
on log_test.datetime between t3.period_start and t3.period_end
group by username, t3.period_start, t3.period_end
order by username, t3.period_start, t3.period_end;
username  period_start  period_end  num_rows
--
username  2013-09-30    2013-10-10  1
username  2013-09-30    2013-10-11  1
username  2013-09-30    2013-10-12  1
...
username  2014-03-01    2014-03-11  4
username  2014-03-01    2014-03-12  4
...
username  2015-05-20    2015-06-28  1
username  2015-05-20    2015-06-29  1

Finally, we can work some arithmetic magic, and get the density of each "window".

select username, 
       t3.period_start, t3.period_end, t3.n, 
       count(datetime) num_rows,
       count(datetime)/t3.n density
from log_test
inner join (
  select datetime period_start, t2.n, datetime + interval t2.n day period_end
  from (
    select distinct datetime, t.n
    from log_test
    cross join (select n from integers where n between 10 and 40) t ) t2
  order by period_start, period_end ) t3
on log_test.datetime between t3.period_start and t3.period_end
group by username, t3.period_start, t3.period_end, t3.n
order by username, density desc;
username  period_start  period_end  n   num_rows  density
--
username  2014-03-01    2014-03-11  10  4         0.4000
username  2014-03-01    2014-03-12  11  4         0.3636
username  2014-03-01    2014-03-13  12  4         0.3333
...

改进建议

您可能想要更改日期算法。就目前情况而言,这些查询只是将“n”天添加到测试表中的日期。但这意味着周期不会围绕缺口对称。例如,日期 2014-03-01 出现在一个很长的间隙之后。就目前情况而言,我们不会尝试评估在 2014 年 3 月 1 日结束的“窗口”的密度(与之前的间隙中第一个值出现的“窗口”)它)。对于您的应用程序来说,这可能值得考虑。

关于mysql - 滑动,可变 "window",行密度最高,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23959544/

相关文章:

php - 在 PHP 中保持 MySQL 凭据私密性的最佳方法是什么?

mysql - 使用 count() 分组

mysql - 如何复制表中有限行的所有数据库

php - 从日期中提取年份并使用年份从数据库中获取数据

mysql - 选择并统计所有条目并根据条目进行分组

MYSQL 自动增加一列或只有一个整数,区别?

sql - 如何使用 DB2 Explain?

mysql - 选择记录并按分数 desc 排序

mysql - "many to many to many"关系

php - 在 Zend Framework 2 中使用 JOIN sql 检索数据