今天我对一个表做了一些更改,试图使某些类型的查询运行得更快。这是表格(在我更改之前):
CREATE TABLE IF NOT EXISTS street_addresses (
id INTEGER PRIMARY KEY NOT NULL,
house_number INTEGER NOT NULL,
entrance TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
street_name INTEGER NOT NULL REFERENCES street_names(id),
postal_code INTEGER NOT NULL REFERENCES postal_codes(id),
city INTEGER NOT NULL REFERENCES cities(id),
municipality INTEGER NOT NULL REFERENCES municipalities(id),
CONSTRAINT unique_address UNIQUE(
street_name, house_number, entrance, postal_code, city
)
)
该表有两个索引(我可以识别):主键和跨 5 列的唯一键。我经常需要仅使用门牌号 和邮政编码 列,或者门牌号 和城市< 来查询街道地址/em> 列,所以我将建表 SQL 更改为:
CREATE TABLE IF NOT EXISTS street_addresses (
id INTEGER PRIMARY KEY NOT NULL,
house_number INTEGER NOT NULL,
entrance TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
street_name INTEGER NOT NULL REFERENCES street_names,
postal_code INTEGER NOT NULL REFERENCES postal_codes,
city INTEGER NOT NULL REFERENCES cities,
municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
ON street_addresses (
street_name, house_number, entrance, postal_code, city
);
我添加了两个索引并将 UNIQUE 索引从表定义中移出(以便我将所有键放在一个地方。)此外,我从 中删除了
行,因为根据文档,它默认使用主键。我的数据库现在明显变大了,但至少使用门牌号和邮政编码获取地址要快几十倍!(id)
REFERENCES
不幸的是,按街道名称和门牌号搜索的查询似乎不再使用我的索引,这是我的数据库中最常见的一种查询。在表更改之前,我得到了使用街道名称和门牌号每秒约 1700 次读取,现在我得到约 50 次。如果我使用所有 5 列进行搜索,我仍然可以获得良好的旧速度,但仅使用 UNIQUE 键中的前 2 列现在非常慢。
此外,使用门牌号和城市的查询仍然和以前一样慢,比使用门牌号和邮政编码的搜索要慢得多。
知道这是怎么发生的吗?我是否需要为街道名称和门牌号定义新索引,即使这些列是 UNIQUE 键的一部分?如果是这样,为什么我之前的查询速度这么快?另外,为什么门牌号和城市查询没有像门牌号和邮政编码查询那样享受同样的速度提升?
抱歉文字墙。我希望有人能帮忙。这是我正在使用的选择查询:
我的基准:
表格变更前:
$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number [ ============================ 100% (10000/10000) ============================ ] 5.9129 seconds 0.0006 seconds per interval 1691 intervals per second $ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city [ ============================ 100% (10000/10000) ============================ ] 3.2198 seconds 0.0003 seconds per interval 3106 intervals per second $ bin/benchmark_norway_database --search-by-components 100 --house_number --postal_code [ ============================== 100% (100/100) ============================== ] 9.957 seconds 0.0996 seconds per interval 10 intervals per second $ bin/benchmark_norway_database --search-by-components 100 --house_number --city [ ============================== 100% (100/100) ============================== ] 10.2446 seconds 0.1024 seconds per interval 10 intervals per second
After table change:
# This is now so dreadfully slow I can't do 10000 intervals. $ bin/benchmark_norway_database --search-by-components 500 --street_name --house_number [ ============================== 100% (500/500) ============================== ] 9.5749 seconds 0.0191 seconds per interval 52 intervals per second # Still fast! $ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city [ ============================ 100% (10000/10000) ============================ ] 3.4125 seconds 0.0003 seconds per interval 2930 intervals per second # Much, much faster than before! $ bin/benchmark_norway_database --search-by-components 10000 --house_number --postal_code [ ============================ 100% (10000/10000) ============================ ] 22.2646 seconds 0.0022 seconds per interval 449 intervals per second # Still slow? Why? :S $ bin/benchmark_norway_database --search-by-components 500 --house_number --city [ ============================== 100% (500/500) ============================== ] 14.3483 seconds 0.0287 seconds per interval 35 intervals per second
My select query:
SELECT
sn.name, sa.house_number, sa.entrance, pc.postal_code,
ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
street_addresses AS sa
INNER JOIN street_names AS sn ON sa.street_name = sn.id
INNER JOIN postal_codes AS pc ON sa.postal_code = pc.id
INNER JOIN cities AS ci ON sa.city = ci.id
INNER JOIN municipalities AS mu ON sa.municipality = mu.id
INNER JOIN counties AS co ON mu.county = co.id
WHERE
...
ORDER BY
ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
0, 100
注意:在 WHERE
部分,我在搜索街道名称时使用 GLOB,例如:
WHERE
sn.name GLOB "FORNEBUVEIEN" AND
sa.house_number = 11
我所有的表模式,假设它们是相关的:
CREATE TABLE IF NOT EXISTS counties (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT UNIQUE NOT NULL
)
CREATE TABLE IF NOT EXISTS municipalities (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
number INTEGER NOT NULL,
county INTEGER NOT NULL REFERENCES counties,
CONSTRAINT unique_municipality UNIQUE(name, county)
);
CREATE UNIQUE INDEX IF NOT EXISTS mu_number
ON municipalities (number);
CREATE UNIQUE INDEX IF NOT EXISTS mu_unique_name_co
ON municipalities (name, county);
CREATE TABLE IF NOT EXISTS cities (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE UNIQUE INDEX IF NOT EXISTS ci_unique_name_mu
ON cities (name, municipality);
CREATE TABLE IF NOT EXISTS postal_codes (
id INTEGER PRIMARY KEY NOT NULL,
postal_code INTEGER NOT NULL,
city INTEGER NOT NULL REFERENCES cities
);
CREATE UNIQUE INDEX IF NOT EXISTS po_postal_code
ON postal_codes (postal_code);
CREATE TABLE IF NOT EXISTS street_names (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS sn_name
ON street_names (name);
CREATE TABLE IF NOT EXISTS street_addresses (
id INTEGER PRIMARY KEY NOT NULL,
house_number INTEGER NOT NULL,
entrance TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
street_name INTEGER NOT NULL REFERENCES street_names,
postal_code INTEGER NOT NULL REFERENCES postal_codes,
city INTEGER NOT NULL REFERENCES cities,
municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
ON street_addresses (
street_name, house_number, entrance, postal_code, city
);
我在导入所有数据后运行这些命令:
PRAGMA journal_mode = OFF
PRAGMA page_size = 65536
VACUUM
使用街道名称和门牌号时解释查询计划:
sqlite> EXPLAIN QUERY PLAN SELECT sn.name, sa.house_number, sa.entrance, pc.postal_code, ci.name, mu.name, co.name, sa.latitude, sa.longitude FROM street_addresses AS sa INNER JOIN street_names AS sn ON sa.street_name = sn.id INNER JOIN postal_codes AS pc ON sa.postal_code = pc.id INNER JOIN cities AS ci ON sa.city = ci.id INNER JOIN municipalities AS mu ON sa.municipality = mu.id INNER JOIN counties AS co ON mu.county = co.id WHERE sn.name GLOB "FORNEBUVEIEN" AND sa.house_number=11 ORDER BY ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC LIMIT 0, 100; selectid order from detail ---------- ---------- ---------- ------------------------------------------------------------------------- 0 0 0 SEARCH TABLE street_addresses AS sa USING INDEX sa_hn_ci (house_number=?) 0 1 1 SEARCH TABLE street_names AS sn USING INTEGER PRIMARY KEY (rowid=?) 0 2 2 SEARCH TABLE postal_codes AS pc USING INTEGER PRIMARY KEY (rowid=?) 0 3 3 SEARCH TABLE cities AS ci USING INTEGER PRIMARY KEY (rowid=?) 0 4 4 SEARCH TABLE municipalities AS mu USING INTEGER PRIMARY KEY (rowid=?) 0 5 5 SEARCH TABLE counties AS co USING INTEGER PRIMARY KEY (rowid=?) 0 0 0 USE TEMP B-TREE FOR ORDER BY
最佳答案
事实证明,在我的 SELECT
查询中使用这样的 WHERE
部分时:
WHERE
sn.name GLOB ? AND
sa.house_number = ?
SQLite3 选择索引 sa_hn_ci
(house_number, city) 而不是 sa_unique_address
。这使查询运行速度慢了大约 100 倍。
每当我的查询包含街道名称时,我现在通过使用 INDEXED BY
来解决这个问题:
SELECT
sn.name, sa.house_number, sa.entrance, pc.postal_code,
ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
street_addresses AS sa INDEXED BY sa_unique_address -- This line!
INNER JOIN street_names AS sn ON sa.street_name = sn.id
INNER JOIN postal_codes AS pc ON sa.postal_code = pc.id
INNER JOIN cities AS ci ON sa.city = ci.id
INNER JOIN municipalities AS mu ON sa.municipality = mu.id
INNER JOIN counties AS co ON mu.county = co.id
WHERE
sn.name GLOB "FORNEBUVEIEN" AND
sa.house_number=11
ORDER BY
ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
0, 100;
但我不知道为什么 SQLite3 一开始就选择了错误的索引。运行 ANALYZE
没有任何改变。
我没有将此答案标记为正确。
关于sql - 为什么更改我的表模式会减慢我的查询速度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23657721/