mysql - 具有限制和性能提升的表连接

标签 mysql sql performance limit

背景: 我正在编写一个内部搜索引擎优化爬虫来检查我们在谷歌中的位置。 抓取工作非常出色,存储也很好,但我现在在显示数据时遇到性能问题(目前存储表有超过 1100 万条记录,大小超过 6.0GB。)

我正在尝试创建一个 SQL 查询,它将显示 input_keywords 表中的所有记录,然后显示 rank_result 表中的最后结果(对于给定 CompanyName)以及 rank_result 表中的先前结果(它将向我们显示我们的走势,向上或向下)

表格如下

表:input_keywords

-------------------------------------------------------------------------------------------------------
| Field           | Type             | Null | Key | Default             | Extra                       |
-------------------------------------------------------------------------------------------------------
| id              | int(11) unsigned | NO   | PRI | NULL                | auto_increment              |
-------------------------------------------------------------------------------------------------------
| keyword         | char(150)        | YES  | UNI | NULL                |                             |
-------------------------------------------------------------------------------------------------------
| last_check      | timestamp        | YES  | MUL | 2000-01-01 00:00:00 |                             |
-------------------------------------------------------------------------------------------------------
| CREATION        | timestamp        | YES  |     | CURRENT_TIMESTAMP   |                             |
-------------------------------------------------------------------------------------------------------
| MODIFICATION    | timestamp        | YES  |     | NULL                | on update CURRENT_TIMESTAMP |
-------------------------------------------------------------------------------------------------------
| p_deep          | int(1)           | YES  |     | 5                   |                             |
-------------------------------------------------------------------------------------------------------
| check_freq_days | int(11)          | YES  |     | 3                   |                             |
-------------------------------------------------------------------------------------------------------
| type            | char(50)         | YES  |     | NULL                |                             |
-------------------------------------------------------------------------------------------------------
| competitor      | char(100)        | YES  | MUL | CompanyName            |                          |
-------------------------------------------------------------------------------------------------------

表:rank_result:

-----------------------------------------------------------------------------
| Field          | Type             | Null | Key | Default | Extra          |
-----------------------------------------------------------------------------
| id             | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
-----------------------------------------------------------------------------
| keyword        | char(150)        | YES  | MUL |         |                |
-----------------------------------------------------------------------------
| result_url     | text             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| position       | int(11)          | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| check_time     | timestamp        | YES  | MUL | NULL    |                |
-----------------------------------------------------------------------------
| useragent_used | char(255)        | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| proxy_log      | text             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| check_date     | date             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| competitor     | tinytext         | YES  |     | NULL    |                |
-----------------------------------------------------------------------------

一些示例数据来展示我正在努力实现的目标

示例内容:input_keywords

-----------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword               | last_check          | CREATION            | MODIFICATION        | p_deep | check_freq_days | type | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 2  | guitar accessories    | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 3  | guitar amps           | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 4  | guitar strings        | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 5  | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------

示例内容:rank_result(已编辑以仅显示相关数据)

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id    | keyword            | result_url                           | position | check_time          | useragent_used                       | proxy_log             | check_date | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 723   | guitar accessories | https://www.companyname.com/gui… | 33       | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL                  | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1572  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1672  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2511  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2656  | guitar accessories | https://www.companyname.com/gui… | 33       | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181         | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2809  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3147  | guitar accessories | https://www.companyname.com/gui… | 36       | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181         | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3490  | guitar accessories | https://www.companyname.com/gui… | 31       | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252        | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4530  | guitar accessories | https://www.companyname.com/gui… | 31       | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252        | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5277  | guitar accessories | https://www.companyname.com/gui… | 34       | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281   | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5480  | guitar accessories | https://www.companyname.com/gui… | 38       | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281   | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 9953  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40       | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281  | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38       | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281   | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32       | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281   | 2017-01-24 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

示例输出:

---------------------------------------------------------------------------------------------------------------------------------------------
| search_keyword              | p_deep | check_freq_days | CREATION            | last_check          | current_position | previous_position |
---------------------------------------------------------------------------------------------------------------------------------------------
| guitar accessories          | 5      | 3               | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37               | 39                |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitar strings     | 5      | 3               | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitars            | 5      | 1               | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14               | 14                |
---------------------------------------------------------------------------------------------------------------------------------------------
| bass guitars                | 5      | 1               | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41               | 44                |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar Amp1 Nanotube     | 5      | 1               | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar NanoCab           | 5      | 1               | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a bass guitar      | 5      | 3               | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5                | 4                 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a guitar           | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24               | 24                |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an acoustic guitar | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12               | 12                |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an electric guitar | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10               | 11                |
---------------------------------------------------------------------------------------------------------------------------------------------

目前我的查询如下:

SELECT i.`keyword` AS 'search_keyword',  i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`,
              (SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position',
              (SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position'
              FROM input_keywords i
              WHERE i.keyword LIKE "%s"
              order by i.keyword ASC
              LIMIT 0,100

所以我的问题如下:

  1. 有没有更好的方法来编写这个查询
  2. 我必须将此限制为 100 个结果,否则查询太长并超时,能否解决此问题。
  3. 如果我不需要 ORDER BY rr.check_time DESC ,查询速度会快数百倍,但显然不会返回正确的信息,因为它得到的不是最后一条记录,而是第一条,所以我可以在一种不同的方式?
  4. 我非常希望没有 WHERE KEYWORD LIKE 并只返回我所有的 input_keywords 及其当前排名和之前的排名。

附加信息:

返回关键字的当前排名:

***input_keywords          rank_result***
    keyword           ==    keyword
    last_check        ==    check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure)
    competitor        ==    competitor (this allows us to monitor us and our competitors.)

返回关键词之前的排名

***input_keywords          rank_result***
    keyword           ==    keyword
    competitor        ==    competitor (this allows us to monitor us and our competitors.)
    ORDER BY check_time desc
    LIMIT 1,1 (to get the last but one result)

请善待 - 所有这些东西我都是自学的!

编辑 1.

在我当前的查询中解释 Extended(我也包含了 create 语句)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type        | table | type | possible_keys                | key     | key_len | ref                          | rows | filtered | Extra                       |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1  | PRIMARY            | i     | ALL  | NULL                         | NULL    | NULL    | NULL                         | 1682 | 100.00   | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3  | DEPENDENT SUBQUERY | rr    | ref  | keyword                      | keyword | 451     | func                         | 32   | 100.00   | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2  | DEPENDENT SUBQUERY | r     | ref  | keyword,idx_rank_result_che… | keyword | 609     | func,GoogleCrawler.i.last_c… | 2    | 100.00   | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CREATE TABLE `input_keywords` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` char(150) DEFAULT NULL COMMENT 'the keyword....',
  `last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly',
  `CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  `p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5',
  `check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3',
  `type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc',
  `competitor` tinytext,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UNQ_Keyword` (`keyword`),
  KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100))
) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8;


CREATE TABLE `rank_result` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` char(150) DEFAULT '',
  `result_url` text,
  `position` int(11) DEFAULT NULL,
  `check_time` timestamp NULL DEFAULT NULL,
  `useragent_used` char(255) DEFAULT NULL,
  `proxy_log` text,
  `check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting',
  `competitor` tinytext,
  PRIMARY KEY (`id`),
  KEY `keyword` (`keyword`,`check_time`,`competitor`(50)),
  KEY `idx_rank_result_check_time` (`check_time`)
) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8;

编辑 2:

从目前的两个答案来看,我已经调整了我在 rank_result 上的索引,并按时间刻度添加了限制。 我现在可以在 <1 秒内得到结果,这是一个了不起的结果。

然而。

我仍然觉得我的查询看起来真的很“hacky”,并且觉得一定有更好、更简洁的解决方案 - 有吗?

(当前正在生产中的查询)

SELECT i.`keyword` AS search_keyword,  i.p_deep, i.check_freq_days, 
i.CREATION, i.last_check,
        (SELECT r.position
         FROM rank_result r 
         WHERE r.`keyword` = search_keyword AND
               r.`competitor` = 'Absolute' AND
               i.`last_check` = r.`check_time`
         ORDER BY r.check_time DESC
         LIMIT 0,1
        ) AS 'current_position',
        (SELECT rr.`position`
         FROM rank_result rr
         WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK)
         ORDER BY rr.check_time DESC
         LIMIT 1, 1
        ) AS 'previous_position'
        FROM input_keywords i
        ORDER BY i.keyword ASC

最佳答案

对于这个查询:

SELECT i.`keyword` AS search_keyword,  i.p_deep, i.check_freq_days, i.CREATION, i.last_check,
        (SELECT r.position
         FROM rank_result r 
         WHERE r.`keyword` = search_keyword AND
               r.`competitor` = 'CompanyName' AND
               i.`last_check` = r.`check_time`
         ORDER BY r.check_time DESC
         LIMIT 0,1
        ) AS current_position,
        (SELECT rr.`position`
         FROM rank_result rr
         WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName'
         ORDER BY rr.check_time DESC
         LIMIT 1, 1
        ) AS previous_position
FROM input_keywords i
WHERE i.keyword LIKE "%s"
ORDER BY i.keyword ASC
LIMIT 0, 100;

你想要一个关于 rank_result(keyword, competitor, check_time, position) 的索引。

关于mysql - 具有限制和性能提升的表连接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45144823/

相关文章:

MySQL : add data from another database, 从特定行开始,user_id 为主,需要插入从 20 开始递增的数字

php - 基于标签权重的相似文章

mysql - 在MySQL中,如何获取预定义列表中至少包含两个字符的所有字符串

sql - 没有聚合函数的 GROUP BY 子句的任何原因?

php - 在 Ruby、Python 或 PHP 中,是否有一个插入到 Array 对象中间需要 O(1) 时间的操作?

php - 检查文章是否包含团队名称

php - 如何使用 PHP 获取 MySQL 表中的行数?

sql - 连接两个表并获取同一列中的输出数据

c# - 在 C# 中拆分 CSV 文件的有效方法

java - 实现更有效的矩阵 - 使用数组数组(二维)或一维数组?