我有以下布局的表格email, item_id, json
在哪里
email
是一个字符串 item_id
是以毫秒为单位的 unix 时间戳 json
是与 JSON1 扩展一起使用的项目数据 我在该表上也有一个多列索引
email, id
我以
WHERE email = 'asd' AND item_id > ... AND item_id < ...
的样式执行了很多查询我已经和 MongoDB 打交道太多年了,所以我习惯了不处理数据库规范化,而只是使用最简单的 SQL 表布局。
在手机上,对于 35000 个项目的查询,对上述样式的查询最多可能需要一秒钟。该索引确实被使用了。
当我通过使用
email, email_id
创建一个新表来规范化数据库时,我会获得显着的性能提升吗?并将原来的更改为email_id, item_id, json
并开始通过 JOINS 查询?在那种情况下 email, email_id
将包含大约 2-5 个项目和 email_id, item_id, json
数以千计。
最佳答案
使用 3 测试原始,使用 JOIN 和附加选项的查询,使用子查询而不是连接来根据电子邮件地址获取电子邮件 ID,并将其与 email_id 进行比较。子查询名列前茅,原来的表现最差。
结果是:-
SELECT * FROM original WHERE email = 'email3@ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406
OK Time: 0.199s
SELECT * FROM item WHERE email_id = (SELECT email.email_id FROM email WHERE email.email = 'email3@ouremail.com') AND item_id > 7800 AND item_id < 2404327029516376406
OK Time: 0.082s
SELECT * FROM item JOIN email ON item.email_id = email.email_id WHERE email.email = 'email3@ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406
OK Time: 0.109s
以下用于创建和测试:-
DROP TABLE IF EXISTS original;
CREATE TABLE IF NOT EXISTS original (email TEXT, item_id INTEGER, json BLOB, PRIMARY KEY(email,item_id));
WITH RECURSIVE cnt(x,y,z)
AS (
SELECT 'email'||(1 + ABS(random() / (9223372036854775807 / 5)))||'@ouremail.com',
ABS(random()),
randomblob(ABS(random() / (9223372036854775807 / 40) ))
UNION ALL SELECT
'email'||(1 + ABS(random() / (9223372036854775807 / 5)))||'@ouremail.com',
ABS(random()),
randomblob(ABS(random() / (9223372036854775807 / 40)))
FROM cnt LIMIT 350000
)
INSERT INTO original SELECT * FROM cnt;
DROP TABLE IF EXISTS email;
CREATE TABLE IF NOT EXISTS email (email_id INTEGER PRIMARY KEY, email TEXT);
INSERT INTO email SELECT DISTINCT null,email FROM original;
DROP TABLE IF EXISTS item;
CREATE TABLE IF NOT EXISTS item (email_id, item_id, json);
INSERT INTO item SELECT
(SELECT email_id FROM email WHERE original.email = email.email),
item_id,
json FROM original;
SELECT * FROM original WHERE email = 'email3@ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
SELECT * FROM item WHERE email_id = (SELECT email.email_id FROM email WHERE email.email = 'email3@ouremail.com') AND item_id > 7800 AND item_id < 2404327029516376406;
SELECT * FROM item JOIN email ON item.email_id = email.email_id WHERE email.email = 'email3@ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
你可能会更糟糕的是运行以下命令并查看输出。
EXPLAIN QUERY PLAN SELECT * FROM original WHERE email = 'email3@ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
EXPLAIN QUERY PLAN SELECT * FROM item WHERE email_id = (SELECT email.email_id FROM email WHERE email.email = 'email3@ouremail.com') AND item_id > 7800 AND item_id < 2404327029516376406;
EXPLAIN QUERY PLAN SELECT * FROM item JOIN email ON item.email_id = email.email_id WHERE email.email = 'email3@ouremail.com' AND item_id > 7800 AND item_id < 2404327029516376406;
关于performance - SQLite 索引性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50826767/