sql - 如何调整表包含 30,000 多行的 7 表连接 MySQL 计数查询?

标签 sql mysql count performance

我有一个 sql 查询,用于计算复杂查询的结果数。当限制为 20 个结果时,实际的选择查询非常快,但经过大量优化后,计数版本在我当前的表上大约需要 4.5 秒。

如果我删除网站标签和画廊标签上的两个连接和 where 子句,查询将在 1.5 秒内执行。如果我创建 3 个单独的查询 - 一个选择付费网站,一个选择名称,一个将所有内容放在一起 - 我可以将查询缩短到 0.6 秒,这仍然不够好。这也将迫使我使用存储过程,因为我将不得不在 Hibernate 中进行总共 4 次查询。

对于“原样”查询,这里有一些信息:

Handler_read_key 为 1746669
Handler_read_next 为 1546324

gallery 表有 40,000 行
站点表有 900 行
名称表有 800 行
标签表有3560行

我对 MySQL 和调优还很陌生,我在以下方面有索引:

  • 标签表中的'term'列
  • 图库表中的“已发布”列
  • 名称表的“值”

我希望将此查询缩短到 0.1 毫秒。

SELECT count(distinct gallery.id)
from gallery gallery 
    inner join
        site site 
            on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and (
        name.value LIKE 'sometext%' or
        tag.term = 'sometext' or 
        site.`name` like 'sometext%' or
        site_tag.term = 'sometext'
    )

解释数据:

| id | select_type | table        | type   | possible_keys                                                     | key                | key_len | ref                                       | rows | Extra                              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
|  1 | SIMPLE      | site         | index  | PRIMARY,nameIndex                                                 | nameIndex          | 258     | NULL                                      |  950 | Using index; Using temporary       |
|  1 | SIMPLE      | gallery      | ref    | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex    | FKF44C775296EECE37 | 9       | production.site.id                        |   20 | Using where                        |
|  1 | SIMPLE      | g2mn         | ref    | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188                     | FK3EFFD7F8AFAD7A5E | 8       | production.gallery.id                     |    1 | Using index; Distinct              |
|  1 | SIMPLE      | name         | eq_ref | PRIMARY,valueIndex                                                | PRIMARY            | 8       | production.g2mn.name_id                   |    1 | Distinct                           |
|  1 | SIMPLE      | g2t          | ref    | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6                     | FK3DDB4D63AFAD7A5E | 8       | production.g2mn.gallery_id                |    2 | Using where; Using index; Distinct |
|  1 | SIMPLE      | tag          | eq_ref | PRIMARY,termIndex                                                 | PRIMARY            | 8       | production.g2t.tag_id                     |    1 | Distinct                           |
|  1 | SIMPLE      | p2t          | ref    | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6                     | PRIMARY            | 8       | production.gallery.site_id                |    3 | Using where; Using index; Distinct |
|  1 | SIMPLE      | site_tag     | eq_ref | PRIMARY,termIndex                                                 | PRIMARY            | 8       | production.p2t.tag_id                     |    1 | Using where; Distinct              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+

个人计数速度:

[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385

[SQL] 
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615

[SQL] 
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104

[SQL] 
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560    

[SQL] 
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901

[SQL] 
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026

最佳答案

我在本文末尾包含了我的测试架构和用于生成测试数据的脚本。我用过 SQL_NO_CACHE防止 MySQL 缓存查询结果的选项 - 这仅用于测试,最终应该被删除。

这个和Donnie提出的思路差不多,不过我稍微整理了一下。如果我正确理解了联接,则无需在每个选择中重复所有联接,因为每个联接实际上都独立于其他联接。原始的 WHERE 子句规定 gallery.published 必须为真,然后是一系列由 OR 连接的 4 个条件。因此,每个查询都可以单独执行。以下是四个连接:

gallery <--> gallery_to_name <--> name
gallery <--> gallery_to_tag <--> tag
gallery <--> site
gallery <--> site <--> site_to_tag <--> tag

因为 gallery 包含 site_id,在这种情况下,不需要通过 site 表进行中间连接。因此,最后一个连接可以简化为:

gallery <--> site_to_tag <--> tag

分别运行每个 SELECT,并使用 UNION 组合结果,速度非常快。这里的结果假定表结构和索引显示在本文末尾:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site AS s ON s.id = g.site_id
    WHERE g.published = TRUE AND s.name LIKE '3GRD%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
    INNER JOIN name AS n ON n.id = g2n.name_id
    WHERE g.published = TRUE AND n.value LIKE '3GRD%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
    INNER JOIN tag AS gt  ON gt.id = g2t.tag_id
    WHERE g.published = TRUE AND gt.term = '3GRD')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
    INNER JOIN tag AS st  ON st.id = s2t.tag_id
    WHERE g.published = TRUE AND st.term = '3GRD')
) AS totals;

+---------+
| matches |
+---------+
|      99 |
+---------+
1 row in set (0.00 sec)

速度确实因搜索条件而异。在以下示例中,每个表使用了不同的搜索值,LIKE 运算符必须做更多的工作,因为现在每个表都有更多的潜在匹配项:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site AS s ON s.id = g.site_id
    WHERE g.published = TRUE AND s.name LIKE '3H%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
    INNER JOIN name AS n ON n.id = g2n.name_id
    WHERE g.published = TRUE AND n.value LIKE '3G%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
    INNER JOIN tag AS gt  ON gt.id = g2t.tag_id
    WHERE g.published = TRUE AND gt.term = '3IDP')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
    INNER JOIN tag AS st  ON st.id = s2t.tag_id
    WHERE g.published = TRUE AND st.term = '3OJX')
) AS totals;

+---------+
| matches |
+---------+
|   12505 |
+---------+
1 row in set (0.24 sec)

这些结果优于使用多个连接的查询:

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name            AS n   ON n.id = g2n.name_id
INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag             AS gt  ON gt.id = g2t.tag_id
INNER JOIN site            AS s   ON s.id = g.site_id
INNER JOIN site_to_tag     AS s2t ON s2t.site_id = s.id
INNER JOIN tag             AS st  ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
    gt.term = '3GRD' OR
    st.term = '3GRD' OR
    n.value LIKE '3GRD%' OR
    s.name LIKE '3GRD%');

+---------+
| matches |
+---------+
|      99 |
+---------+
1 row in set (2.62 sec)

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name            AS n   ON n.id = g2n.name_id
INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag             AS gt  ON gt.id = g2t.tag_id
INNER JOIN site            AS s   ON s.id = g.site_id
INNER JOIN site_to_tag     AS s2t ON s2t.site_id = s.id
INNER JOIN tag             AS st  ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
    gt.term = '3IDP' OR
    st.term = '3OJX' OR
    n.value LIKE '3G%' OR
    s.name LIKE '3H%');

+---------+
| matches |
+---------+
|   12505 |
+---------+
1 row in set (3.17 sec)

架构
id 列的索引加上 site.namename.valuetag.term 很重要:

DROP SCHEMA IF EXISTS `egervari`;
CREATE SCHEMA IF NOT EXISTS `egervari`;
USE `egervari`;

-- -----------------------------------------------------
-- Table `site`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site` ;
CREATE  TABLE IF NOT EXISTS `site` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `name` VARCHAR(255) NOT NULL ,
  INDEX `name` (`name` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery` ;
CREATE  TABLE IF NOT EXISTS `gallery` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `site_id` INT UNSIGNED NOT NULL ,
  `published` TINYINT(1) NOT NULL DEFAULT 0 ,
  PRIMARY KEY (`id`) ,
  INDEX `fk_gallery_site` (`site_id` ASC) ,
  CONSTRAINT `fk_gallery_site`
    FOREIGN KEY (`site_id` )
    REFERENCES `site` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `name` ;
CREATE  TABLE IF NOT EXISTS `name` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `value` VARCHAR(255) NOT NULL ,
  INDEX `value` (`value` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `tag` ;
CREATE  TABLE IF NOT EXISTS `tag` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `term` VARCHAR(255) NOT NULL ,
  INDEX `term` (`term` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_name` ;
CREATE  TABLE IF NOT EXISTS `gallery_to_name` (
  `gallery_id` INT UNSIGNED NOT NULL ,
  `name_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`gallery_id`, `name_id`) ,
  INDEX `fk_gallery_to_name_gallery` (`gallery_id` ASC) ,
  INDEX `fk_gallery_to_name_name` (`name_id` ASC) ,
  CONSTRAINT `fk_gallery_to_name_gallery`
    FOREIGN KEY (`gallery_id` )
    REFERENCES `gallery` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_gallery_to_name_name`
    FOREIGN KEY (`name_id` )
    REFERENCES `name` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_tag` ;
CREATE  TABLE IF NOT EXISTS `gallery_to_tag` (
  `gallery_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`gallery_id`, `tag_id`) ,
  INDEX `fk_gallery_to_tag_gallery` (`gallery_id` ASC) ,
  INDEX `fk_gallery_to_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_gallery_to_tag_gallery`
    FOREIGN KEY (`gallery_id` )
    REFERENCES `gallery` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_gallery_to_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `site_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site_to_tag` ;
CREATE  TABLE IF NOT EXISTS `site_to_tag` (
  `site_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`site_id`, `tag_id`) ,
  INDEX `fk_site_to_tag_site` (`site_id` ASC) ,
  INDEX `fk_site_to_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_site_to_tag_site`
    FOREIGN KEY (`site_id` )
    REFERENCES `site` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_site_to_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

测试数据
这会填充 site 900 行,tag 3560 行,name 800 行,gallery 40,000 行,并将条目插入链接表:

DELIMITER //
DROP PROCEDURE IF EXISTS populate//
CREATE PROCEDURE populate()
BEGIN
    DECLARE i INT DEFAULT 0;

    WHILE i < 900 DO
        INSERT INTO site (name) VALUES (CONV(i + 1 * 10000, 20, 36));
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 3560 DO
        INSERT INTO tag (term) VALUES (CONV(i + 1 * 10000, 20, 36));
        INSERT INTO site_to_tag (site_id, tag_id) VALUES ( (i MOD 900) + 1, i + 1 );
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 800 DO
        INSERT INTO name (value) VALUES (CONV(i + 1 * 10000, 20, 36));
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 40000 DO    
        INSERT INTO gallery (site_id, published) VALUES ( (i MOD 900) + 1, i MOD 2 );
        INSERT INTO gallery_to_name (gallery_id, name_id) VALUES ( i + 1, (i MOD 800) + 1 );
        INSERT INTO gallery_to_tag (gallery_id, tag_id) VALUES ( i + 1, (i MOD 3560) + 1 );
        SET i = i + 1;
    END WHILE;
END;
//
DELIMITER ;
CALL populate();

关于sql - 如何调整表包含 30,000 多行的 7 表连接 MySQL 计数查询?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3151410/

相关文章:

php - Yii:从一个表单中保存多个模型并检查现有记录

Laravel eloquent中的Mysql按查询求和分组

java - 通过计算数组结果来做到这一点是明智的方法吗?

java - 如何从mysql中的用户那里获取WHERE值

sql - 如何删除子表中的数据?

mysql - JOIN 用于无法装入内存的大型 mysql 数据库。

sql - 在 SQL Server 2008 R2 中 - 您可以在数据库图表工具中重命名表吗?

sql - 使用 UDF 拆分值对和创建表

MySQL 8.0 : Create a polygon as a circle around a point

MYSQL语法条件COUNT