sql - 使用 NOT IN 性能低下

标签 sql postgresql

我正在尝试进行查询,以计算每年有多少部 Actor 不是男性的电影(对于每一年,计算那一年没有男性的电影数量).

这些是表格:

ACTOR (id, fname, lname, gender)
MOVIE (id, name, year)
CASTS (pid, mid, role)  -- pid refers to actor id, mid refers to movie id

这是我编入索引的内容(表的 id 是主键,所以它们已经编入索引,或者我假设如此):

CREATE INDEX gender_index on actor(gender);
CREATE INDEX movie_name_index on movie(name);
CREATE INDEX movie_year_index on movie(year);
CREATE INDEX casts_index on casts(pid, mid, role);
CREATE INDEX casts_pid_index on casts(pid);
CREATE INDEX casts_mid_index on casts(mid);
CREATE INDEX casts_role_index on casts(role);

这是我的查询:

SELECT m.year, count(m.id)
FROM movie as m
WHERE m.id NOT IN (
    SELECT DISTINCT m.id
    FROM movie as m, casts as c, actor as a
    WHERE m.id = c.mid and a.id = c.pid and a.gender = 'M'
)
GROUP BY m.year
ORDER BY m.year

查询需要很长时间(并且永远不会完成),那么我怎样才能让它更快呢?使用 NOT EXISTS 有帮助吗,尽管我认为优化器会处理这个问题?我还需要索引其他东西吗?还有其他更好的查询吗?如果这有任何不同,我正在使用 PostgreSQL。

这是EXPLAIN:

"GroupAggregate  (cost=1512539.61..171886457832.52 rows=61 width=8)"
"  Group Key: m.year"
"  ->  Index Scan using movie_year_index on movie m  (cost=1512539.61..171886453988.38 rows=768706 width=8)"
"        Filter: (NOT (SubPlan 1))"
"        SubPlan 1"
"          ->  Materialize  (cost=1512539.18..1732298.66 rows=1537411 width=4)"
"                ->  Unique  (cost=1512539.18..1718605.60 rows=1537411 width=4)"
"                      ->  Merge Join  (cost=1512539.18..1700559.32 rows=7218511 width=4)"
"                            Merge Cond: (m_1.id = c.mid)"
"                            ->  Index Only Scan using movie_pkey on movie m_1  (cost=0.43..57863.94 rows=1537411 width=4)"
"                            ->  Materialize  (cost=1512531.37..1548623.92 rows=7218511 width=4)"
"                                  ->  Sort  (cost=1512531.37..1530577.65 rows=7218511 width=4)"
"                                        Sort Key: c.mid"
"                                        ->  Hash Join  (cost=54546.59..492838.95 rows=7218511 width=4)"
"                                              Hash Cond: (c.pid = a.id)"
"                                              ->  Seq Scan on casts c  (cost=0.00..186246.43 rows=11445843 width=8)"
"                                              ->  Hash  (cost=35248.91..35248.91 rows=1176214 width=4)"
"                                                    ->  Seq Scan on actor a  (cost=0.00..35248.91 rows=1176214 width=4)"
"                                                          Filter: ((gender)::text = 'M'::text)"

最佳答案

我会努力

SELECT m.year, count(m.id)
FROM movie m
WHERE NOT EXISTS (
    SELECT NULL
    FROM casts c, actor a
    WHERE m.id = c.mid and a.id = c.pid and a.gender = 'M'
)
GROUP BY m.year
ORDER BY m.year

关于sql - 使用 NOT IN 性能低下,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42569560/

相关文章:

php - CodeIgniter Sqlite 不工作

mysql - 为什么从 View 检索数据与该 View 的底层选择检索数据时存在性能差异

postgresql - 将 PostgreSQL 从 9.3 升级到 9.4 时如何设置区域设置

postgresql - 在 PostgreSQL 中创建只插入表

java - jOOQ 提示排除表的多重继承

mysql - 将子查询转换为连接

sql - Oracle SQL中的脚本错误

postgresql - 如何查找违反外键约束的行?

PostgreSQL 无法从服务器 : Software caused connection abort (0x00002745/10053) 接收数据

sql - 从字符串转换日期和/或时间时转换失败(返回月份的最后一天)