我有以下疑问:
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
分析表有 60M 行,事务表有 3M 行。
当我对此查询运行 EXPLAIN
时,我得到:
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| # id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| '1' | 'SIMPLE' | 'analytics' | 'ref' | 'analytics_user_id | analytics_source' | 'analytics_user_id' | '5' | 'const' | '337662' | 'Using where; Using temporary; Using filesort' |
| '1' | 'SIMPLE' | 'transactions' | 'ref' | 'tran_analytics' | 'tran_analytics' | '5' | 'dijishop2.analytics.id' | '1' | NULL | |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
我不知道如何优化这个查询,因为它已经很基础了。运行此查询大约需要 70 秒。
以下是存在的索引:
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| # Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'analytics' | '0' | 'PRIMARY' | '1' | 'id' | 'A' | '56934235' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_user_id' | '1' | 'user_id' | 'A' | '130583' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_product_id' | '1' | 'product_id' | 'A' | '490812' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_affil_user_id' | '1' | 'affil_user_id' | 'A' | '55222' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_source' | '1' | 'source' | 'A' | '24604' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_country_name' | '1' | 'country_name' | 'A' | '39510' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '1' | 'id' | 'A' | '56934235' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '2' | 'user_id' | 'A' | '56934235' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '3' | 'source' | 'A' | '56934235' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| # Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'transactions' | '0' | 'PRIMARY' | '1' | 'id' | 'A' | '2436151' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_user_id' | '1' | 'user_id' | 'A' | '56654' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'transaction_id' | '1' | 'transaction_id' | 'A' | '2436151' | '191' | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_analytics' | '1' | 'analytics' | 'A' | '2436151' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_status' | '1' | 'status' | 'A' | '22' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'gordon_trans' | '1' | 'status' | 'A' | '22' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'gordon_trans' | '2' | 'analytics' | 'A' | '2436151' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
在按照建议添加任何额外索引之前简化了两个表的架构,因为它并没有改善这种情况。
CREATE TABLE `analytics` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`affil_user_id` int(11) DEFAULT NULL,
`product_id` int(11) DEFAULT NULL,
`medium` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`terms` varchar(1024) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`is_browser` tinyint(1) DEFAULT NULL,
`is_mobile` tinyint(1) DEFAULT NULL,
`is_robot` tinyint(1) DEFAULT NULL,
`browser` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`mobile` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`robot` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`platform` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`referrer` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`domain` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`ip` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`continent_code` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`country_name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`city` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `analytics_user_id` (`user_id`),
KEY `analytics_product_id` (`product_id`),
KEY `analytics_affil_user_id` (`affil_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=64821325 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transaction_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`pay_key` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`sender_email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`amount` decimal(10,2) DEFAULT NULL,
`currency` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`analytics` int(11) DEFAULT NULL,
`ip_address` varchar(46) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`session_id` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`eu_vat_applied` int(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `tran_user_id` (`user_id`),
KEY `transaction_id` (`transaction_id`(191)),
KEY `tran_analytics` (`analytics`),
KEY `tran_status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10019356 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
如果以上无法进一步优化。任何关于汇总表的实现建议都会很棒。我们在 AWS 上使用 LAMP 堆栈。上述查询在 RDS (m1.large) 上运行。
最佳答案
我会创建以下索引(b-tree 索引):
analytics(user_id, source, id)
transactions(analytics, status)
这与戈登的建议不同。
索引中列的顺序很重要。
您按特定的 analytics.user_id
进行过滤,因此该字段必须是索引中的第一个。
然后按 analytics.source
分组。为了避免按 source
排序,这应该是索引的下一个字段。您还引用了 analytics.id
,因此最好将此字段作为索引的一部分,放在最后。 MySQL 是否能够仅读取索引而不接触表?我不知道,但它很容易测试。
transactions
的索引必须以 analytics
开头,因为它会在 JOIN
中使用。我们还需要status
。
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
关于mysql - 如何优化这个 MySQL 查询?数百万行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50974851/