mysql - 在 MySQL 中索引和查询分析表的最佳方式

标签 mysql sql indexing subquery analytics

我有一个具有以下结构的分析表(500 万行并且还在增长)

Hits 
  id int() NOT NULL AUTO_INCREMENT,
  hit_date datetime NOT NULL,
  hit_day int(11) DEFAULT NULL,
  gender varchar(255) DEFAULT NULL,
  age_range_id int(11) DEFAULT NULL,
  klout_range_id int(11) DEFAULT NULL,
  frequency int(11) DEFAULT NULL,
  count int(11) DEFAULT NULL,
  location_id int(11) DEFAULT NULL,
  source_id int(11) DEFAULT NULL,
  target_id int(11) DEFAULT NULL,

对该表的大多数查询是在两个日期时间之间查询特定的列子集,并且它们对所有行的所有计数列求和。例如:

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN count END) AS 'gender_male',
   SUM(CASE gender WHEN 'f' THEN count END) AS 'gender_female',
   SUM(CASE age_range_id WHEN 1 THEN count END) AS 'age_18 - 20',
   SUM(CASE target_id WHEN 1 then count END) AS 'target_test'
   SUM(CASE location_id WHEN 1 then count END) AS 'location_NY'
FROM Hits
WHERE (location_id =1 or location_id = 2)
  AND (target_id = 40 OR target_id = 22)
  AND cast(hit_date AS date) BETWEEN '2012-5-4'AND '2012-5-10'
GROUP BY target.id

查询此表的有趣之处在于,where 子句包括命中列名称和值的任何排列,因为这些是我们过滤的对象。因此,上面的特定查询是获取属于名为“测试”的目标的纽约 18 至 20 岁 (age_range_id 1) 男性和女性的数量。然而,有超过 8 个年龄组、10 个 klout 范围、45 个位置、10 个来源等(所有 外键引用)。

我目前在 hot_date 上有一个索引,在 target_id 上有另一个索引。正确索引此表的最佳方法是什么?在所有列字段上使用复合索引似乎本质上是错误的。

有没有其他方法可以在不使用子查询来汇总所有计数的情况下运行此查询?我做了一些研究,这似乎是获取我需要的数据集的最佳方式,但是否有更有效的方式来处理此查询?

最佳答案

这是您的优化查询。这个想法是摆脱 hit_date 上的 ORCAST() 函数,以便 MySQL 可以利用覆盖每个数据子集的复合索引。您需要按顺序在 (location_id, target_id, hit_date) 上创建复合索引。

SELECT id, gender_male, gender_female, `age_18 - 20`, target_test, location_NY
FROM
(
SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id =1)
  AND (target_id = 40)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id

UNION ALL

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id = 2)
  AND (target_id = 22)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id

UNION ALL

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id =1)
  AND (target_id = 22)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id

UNION ALL

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id = 2)
  AND (target_id = 22)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id
) a
GROUP BY id

如果您的选择范围太大以至于没有任何改进,那么您不妨继续扫描所有行,就像您已经在做的那样。

请注意,别名用反引号括起来,而不是单引号,后者已被弃用。我还修复了你的 CASE 子句,它有 count 而不是 1

关于mysql - 在 MySQL 中索引和查询分析表的最佳方式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10541660/

相关文章:

c++ - 位掩码到数组索引

mysql - 使用命令行更改mysql用户密码

mysql - "Where"语句 : match a single word (not substring)

php - MySQL 不会将 for 循环中的最后一项插入数据库

sql - Postgres 列不存在,但它带有别名

sql - 具有可变列名的动态更新语句

MYSQL - 使用 where 和 filesort

indexing - 我如何为不存储值本身的类型实现 IndexMut

mysql - 使用Java根据数据库中的最大ID生成下一个ID

MySQL LIKE 和 NOT LIKE 缺少一条记录