我很难思考我将如何做到这一点。
我每天(大部分时间)都有发票数据,需要按周分组。但是,如果一周进入下个月,我需要桶中只有当月的天数,然后下一个桶将从 1 号开始 - 下周六。以便下一个完整的星期再次从星期日开始。
现在我们根本不对它进行分组,只是按天导出,这为我们提供了滚动 2 年的约 6000 万行(它比示例更复杂,因为它也按项目和客户拆分) .然后将其导入到我们的需求计划软件中,该软件具有每周和每月模型。白天将它们倒入正确的桶中没有问题。
但是,由于我们遇到了一些时间限制,我想减少这大约 6000 万行。但它仍然必须准确地处理数据导入到的每周和每月模型。
我怎样才能这样分组?
Example Data set
+------------+------------+
| date | sales |
+------------+------------+
| 2014-06-22 | 100 |
| 2014-06-23 | 200 |
| 2014-06-24 | 300 |
| 2014-06-25 | 150 |
| 2014-06-26 | 170 |
| 2014-06-27 | 210 |
| 2014-06-28 | 220 |
| 2014-06-29 | 120 |
| 2014-06-30 | 110 |
| 2014-07-01 | 190 |
| 2014-07-02 | 210 |
| 2014-07-03 | 100 |
| 2014-07-04 | 140 |
| 2014-07-05 | 150 |
| 2014-07-06 | 130 |
| 2014-07-07 | 420 |
| 2014-07-08 | 310 |
| 2014-07-09 | 290 |
| 2014-07-10 | 180 |
| 2014-07-11 | 140 |
| 2014-07-12 | 210 |
+------------+------------+
Expected Result:
+------------+------------+
| date | sum(sales) |
+------------+------------+
| 2014-06-22 | 1350 | 7 days in group
| 2014-06-29 | 230 | 2 days in group
| 2014-07-01 | 790 | 5 days in group
| 2014-07-06 | 1680 | 7 days in group
+------------+------------+
编辑:
我们想出了一个可行的解决方案。如果需要,请随意改进它。
SELECT DATE(IF(
MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`)
, DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)
, DATE_FORMAT(`date`,'%Y-%m-01')
)) AS datekey
, SUM(val) AS valsum
FROM tmp.testdata
GROUP BY IF(
MONTH(DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`) -- If the closest previous Sunday from date falls within the same month as the date...
, DATE_SUB(`date`, INTERVAL DAYOFWEEK(`date`) - 1 DAY) -- ...use the date of the closest previous Sunday as the key...
, DATE_FORMAT(`date`,'%Y-%m-01') -- ...otherwise use the 1st of the month the date falls in as the key (since that must mean the date falls in that opening partial week).
)
ORDER BY datekey
谢谢大家!我们将其中的一些结合在一起,最终得到:
SELECT MIN(`date`) AS datekey
, SUM(val) AS valsum
FROM tmp.testdata
GROUP BY DATE_FORMAT(`date`, '%U'), MONTH(`date`), YEAR(`date`)
ORDER BY datekey
或者在我们总是希望桶是星期日或第一个的情况下(例如,当不是所有的日子都有发票时)我们将我的解决方案与这里的解决方案结合起来,因为这里的组更快
SELECT
DATE(IF(MONTH(DATE_SUB(`date`,
INTERVAL DAYOFWEEK(`date`) - 1 DAY)) = MONTH(`date`),
DATE_SUB(`date`,
INTERVAL DAYOFWEEK(`date`) - 1 DAY),
DATE_FORMAT(`date`, '%Y-%m-01'))) AS datekey,
SUM(val) AS valsum
FROM
tmp.testdata
GROUP BY DATE_FORMAT(`date`, '%U') , MONTH(`date`) , YEAR(`date`)
ORDER BY datekey
最佳答案
这里有一些事情要考虑......
calendar
是一个简单的日期表...
SELECT MIN(dt),YEARWEEK(dt),MONTH(dt) FROM calendar WHERE dt BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY YEARWEEK(dt),MONTH(dt);
+------------+--------------+-----------+
| MIN(dt) | YEARWEEK(dt) | MONTH(dt) |
+------------+--------------+-----------+
| 2014-01-01 | 201352 | 1 |
| 2014-01-05 | 201401 | 1 |
| 2014-01-12 | 201402 | 1 |
| 2014-01-19 | 201403 | 1 |
| 2014-01-26 | 201404 | 1 |<-- Overlap
| 2014-02-01 | 201404 | 2 |<-- Overlap
| 2014-02-02 | 201405 | 2 |
| 2014-02-09 | 201406 | 2 |
| 2014-02-16 | 201407 | 2 |
| 2014-02-23 | 201408 | 2 |<-- Overlap
| 2014-03-01 | 201408 | 3 |<-- Overlap
| 2014-03-02 | 201409 | 3 |
| 2014-03-09 | 201410 | 3 |
| 2014-03-16 | 201411 | 3 |
| 2014-03-23 | 201412 | 3 |
| 2014-03-30 | 201413 | 3 |<-- Overlap
| 2014-04-01 | 201413 | 4 |<-- Overlap
| 2014-04-06 | 201414 | 4 |
| 2014-04-13 | 201415 | 4 |
| 2014-04-20 | 201416 | 4 |
| 2014-04-27 | 201417 | 4 |<-- Overlap
| 2014-05-01 | 201417 | 5 |<-- Overlap
| 2014-05-04 | 201418 | 5 |
| 2014-05-11 | 201419 | 5 |
| 2014-05-18 | 201420 | 5 |
| 2014-05-25 | 201421 | 5 |<-- No overlap
| 2014-06-01 | 201422 | 6 |<-- No overlap
| 2014-06-08 | 201423 | 6 |
| 2014-06-15 | 201424 | 6 |
| 2014-06-22 | 201425 | 6 |
| 2014-06-29 | 201426 | 6 |<-- Overlap
| 2014-07-01 | 201426 | 7 |<-- Overlap
| 2014-07-06 | 201427 | 7 |
| 2014-07-13 | 201428 | 7 |
| 2014-07-20 | 201429 | 7 |
| 2014-07-27 | 201430 | 7 |<-- Overlap
| 2014-08-01 | 201430 | 8 |<-- Overlap
| 2014-08-03 | 201431 | 8 |
| 2014-08-10 | 201432 | 8 |
| 2014-08-17 | 201433 | 8 |
| 2014-08-24 | 201434 | 8 |
| 2014-08-31 | 201435 | 8 |<-- Overlap
| 2014-09-01 | 201435 | 9 |<-- Overlap
| 2014-09-07 | 201436 | 9 |
| 2014-09-14 | 201437 | 9 |
| 2014-09-21 | 201438 | 9 |
| 2014-09-28 | 201439 | 9 |<-- Overlap
| 2014-10-01 | 201439 | 10 |<-- Overlap
| 2014-10-05 | 201440 | 10 |
| 2014-10-12 | 201441 | 10 |
| 2014-10-19 | 201442 | 10 |
| 2014-10-26 | 201443 | 10 |<-- Overlap
| 2014-11-01 | 201443 | 11 |<-- Overlap
| 2014-11-02 | 201444 | 11 |
| 2014-11-09 | 201445 | 11 |
| 2014-11-16 | 201446 | 11 |
| 2014-11-23 | 201447 | 11 |
| 2014-11-30 | 201448 | 11 |<-- Overlap
| 2014-12-01 | 201448 | 12 |<-- Overlap
| 2014-12-07 | 201449 | 12 |
| 2014-12-14 | 201450 | 12 |
| 2014-12-21 | 201451 | 12 |
| 2014-12-28 | 201452 | 12 |
+------------+--------------+-----------+
关于mysql - 按周分组,但如果属于下个月则新分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24066140/