MySQL/memSQL 在 BETWEEN 连接条件下不使用索引

标签 mysql query-optimization sqlperformance singlestore

我们有两个表:

  • dates 表,包含过去 10 年和 future 10 年的每天一个日期。
  • states 表包含以下列:start_dateend_datestate

我们运行的查询如下所示:

SELECT dates.date, COUNT(*)
FROM dates
JOIN states
ON dates.date BETWEEN states.start_date AND states.end_date
WHERE dates.date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY dates.date
ORDER BY dates.date;

根据查询计划,memSQL 没有在 JOIN 条件上使用索引,这使得查询变慢。有没有一种方法可以在 JOIN 条件上使用索引?

我们在 dates.date, states.start_date, states.end_date, (states.start_date, states.end_date) 上尝试了 memSQL skiplist 索引

表格和说明:

CREATE TABLE `dates` (
  `date` date DEFAULT NULL,
  KEY `date_index` (`date`)
)

CREATE TABLE `states` (
  `start_date` datetime DEFAULT NULL,
  `end_date` datetime DEFAULT NULL,
  `state` varchar(256) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  KEY `start_date` (`start_date`),
  KEY `end_date` (`end_date`),
  KEY `start_date_end_date` (`start_date`,`end_date`),
)

+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.date] partitions:all est_rows:96 alias:remote_0                                                                               |
| Project [r2.date, CAST(COALESCE($0,0) AS SIGNED) AS `COUNT(*)`] est_rows:96                                                                         |
| Sort [r2.date]                                                                                                                                      |
| HashGroupBy [SUM(r2.`COUNT(*)`) AS $0] groups:[r2.date]                                                                                             |
| TableScan r2 storage:list stream:no                                                                                                                 |
| Repartition [r1.date, `COUNT(*)`] AS r2 shard_key:[date] est_rows:96 est_select_cost:26764032                                                       |
| HashGroupBy [COUNT(*) AS `COUNT(*)`] groups:[r1.date]                                                                                               |
| Filter [r1.date <= states.end_date]                                                                                                                 |
| NestedLoopJoin                                                                                                                                      |
| |---IndexRangeScan drstates_test.states, KEY start_date (start_date) scan:[start_date <= r1.date] est_table_rows:123904 est_filtered:123904         |
| TableScan r1 storage:list stream:no                                                                                                                 |
| Broadcast [dates.date] AS r1 distribution:tree est_rows:96                                                                                          |
| IndexRangeScan drstates_test.dates, KEY date_index (date) scan:[date >= '2017-01-01' AND date <= '2017-01-31'] est_table_rows:18628 est_filtered:96 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+

最佳答案

ON dates.date BETWEEN states.start_date
                  AND states.end_date

本质上是不可优化的。执行此测试的唯一实用方法是单调乏味地测试每一行。

如果您正在使用 MySQL 并且不需要dates 表,请考虑从

SELECT  *
    FROM  states
    WHERE  start_date >= '2017-01-01'
      AND  end_date    < '2017-01-01' + INTERVAL 1 MONTH 

请注意,这适用于 DATEDATETIME 数据类型的任意组合。

既然我不清楚最终目标,我也不清楚下一步该做什么。

关于MySQL/memSQL 在 BETWEEN 连接条件下不使用索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43181286/

相关文章:

c# - 从sql中的两个表中选择

MySQL 查询优化 - distinct、order by 和 limit

sql-server - MS SQL : What is more efficient? 使用联结表或将所有内容存储在 varchar 中?

php - 使用多个连接进行查询

SQLite "LIKE"运算符与 "="运算符相比非常慢

mysql - mysql nodejs中的IN子句

php - 在 php mysql 中使用带有 max 的 innerjoin 时不列出用户

python - 如何以正确的方式在 Python 中连接到 MySQL 数据库?

sql - 如何优化 GROUP BY 查询

php - Codeigniter join 语句与 join 语句