sql - 为什么更改我的表模式会减慢我的查询速度?

标签 sql database performance sqlite

今天我对一个表做了一些更改,试图使某些类型的查询运行得更快。这是表格(在我更改之前):

CREATE TABLE IF NOT EXISTS street_addresses (
  id INTEGER PRIMARY KEY NOT NULL,
  house_number INTEGER NOT NULL,
  entrance TEXT NOT NULL,
  latitude REAL NOT NULL,
  longitude REAL NOT NULL,
  street_name INTEGER NOT NULL REFERENCES street_names(id),
  postal_code INTEGER NOT NULL REFERENCES postal_codes(id),
  city INTEGER NOT NULL REFERENCES cities(id),
  municipality INTEGER NOT NULL REFERENCES municipalities(id),
  CONSTRAINT unique_address UNIQUE(
    street_name, house_number, entrance, postal_code, city
  )
)

该表有两个索引(我可以识别):主键和跨 5 列的唯一键。我经常需要仅使用门牌号邮政编码 列,或者门牌号城市< 来查询街道地址/em> 列,所以我将建表 SQL 更改为:

CREATE TABLE IF NOT EXISTS street_addresses (
  id INTEGER PRIMARY KEY NOT NULL,
  house_number INTEGER NOT NULL,
  entrance TEXT NOT NULL,
  latitude REAL NOT NULL,
  longitude REAL NOT NULL,
  street_name INTEGER NOT NULL REFERENCES street_names,
  postal_code INTEGER NOT NULL REFERENCES postal_codes,
  city INTEGER NOT NULL REFERENCES cities,
  municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
  ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
  ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
  ON street_addresses (
    street_name, house_number, entrance, postal_code, city
  );

我添加了两个索引并将 UNIQUE 索引从表定义中移出(以便我将所有键放在一个地方。)此外,我从 中删除了 (id) REFERENCES 行,因为根据文档,它默认使用主键。我的数据库现在明显变大了,但至少使用门牌号和邮政编码获取地址要快几十倍!

不幸的是,按街道名称和门牌号搜索的查询似乎不再使用我的索引,这是我的数据库中最常见的一种查询。在表更改之前,我得到了使用街道名称和门牌号每秒约 1700 次读取,现在我得到约 50 次。如果我使用所有 5 列进行搜索,我仍然可以获得良好的旧速度,但仅使用 UNIQUE 键中的前 2 列现在非常慢。

此外,使用门牌号和城市的查询仍然和以前一样慢,比使用门牌号和邮政编码的搜索要慢得多。

知道这是怎么发生的吗?我是否需要为街道名称和门牌号定义新索引,即使这些列是 UNIQUE 键的一部分?如果是这样,为什么我之前的查询速度这么快?另外,为什么门牌号和城市查询没有像门牌号和邮政编码查询那样享受同样的速度提升?

抱歉文字墙。我希望有人能帮忙。这是我正在使用的选择查询:


我的基准:

表格变更前:

$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number
[ ============================ 100% (10000/10000) ============================ ]
5.9129 seconds
0.0006 seconds per interval
1691 intervals per second

$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city
[ ============================ 100% (10000/10000) ============================ ]
3.2198 seconds
0.0003 seconds per interval
3106 intervals per second

$ bin/benchmark_norway_database --search-by-components 100 --house_number --postal_code
[ ============================== 100% (100/100) ============================== ]
9.957 seconds
0.0996 seconds per interval
10 intervals per second

$ bin/benchmark_norway_database --search-by-components 100 --house_number --city
[ ============================== 100% (100/100) ============================== ]
10.2446 seconds
0.1024 seconds per interval
10 intervals per second

After table change:

# This is now so dreadfully slow I can't do 10000 intervals.
$ bin/benchmark_norway_database --search-by-components 500 --street_name --house_number
[ ============================== 100% (500/500) ============================== ]
9.5749 seconds
0.0191 seconds per interval
52 intervals per second

# Still fast!
$ bin/benchmark_norway_database --search-by-components 10000 --street_name --house_number --entrance --postal_code --city
[ ============================ 100% (10000/10000) ============================ ]
3.4125 seconds
0.0003 seconds per interval
2930 intervals per second

# Much, much faster than before!
$ bin/benchmark_norway_database --search-by-components 10000 --house_number --postal_code
[ ============================ 100% (10000/10000) ============================ ]
22.2646 seconds
0.0022 seconds per interval
449 intervals per second

# Still slow? Why? :S
$ bin/benchmark_norway_database --search-by-components 500 --house_number --city
[ ============================== 100% (500/500) ============================== ]
14.3483 seconds
0.0287 seconds per interval
35 intervals per second

My select query:

SELECT
  sn.name, sa.house_number, sa.entrance, pc.postal_code,
  ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
  street_addresses AS sa
  INNER JOIN street_names   AS sn ON sa.street_name  = sn.id
  INNER JOIN postal_codes   AS pc ON sa.postal_code  = pc.id
  INNER JOIN cities         AS ci ON sa.city         = ci.id
  INNER JOIN municipalities AS mu ON sa.municipality = mu.id
  INNER JOIN counties       AS co ON mu.county       = co.id
WHERE
  ...
ORDER BY
  ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
  0, 100

注意:在 WHERE 部分,我在搜索街道名称时使用 GLOB,例如:

WHERE
  sn.name GLOB "FORNEBUVEIEN" AND
  sa.house_number = 11

我所有的表模式,假设它们是相关的:

CREATE TABLE IF NOT EXISTS counties (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT UNIQUE NOT NULL
)

CREATE TABLE IF NOT EXISTS municipalities (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL,
  number INTEGER NOT NULL,
  county INTEGER NOT NULL REFERENCES counties,
  CONSTRAINT unique_municipality UNIQUE(name, county)
);
CREATE UNIQUE INDEX IF NOT EXISTS mu_number
  ON municipalities (number);
CREATE UNIQUE INDEX IF NOT EXISTS mu_unique_name_co
  ON municipalities (name, county);

CREATE TABLE IF NOT EXISTS cities (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL,
  municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE UNIQUE INDEX IF NOT EXISTS ci_unique_name_mu
  ON cities (name, municipality);

CREATE TABLE IF NOT EXISTS postal_codes (
  id INTEGER PRIMARY KEY NOT NULL,
  postal_code INTEGER NOT NULL,
  city INTEGER NOT NULL REFERENCES cities
);
CREATE UNIQUE INDEX IF NOT EXISTS po_postal_code
  ON postal_codes (postal_code);

CREATE TABLE IF NOT EXISTS street_names (
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS sn_name
  ON street_names (name);

CREATE TABLE IF NOT EXISTS street_addresses (
  id INTEGER PRIMARY KEY NOT NULL,
  house_number INTEGER NOT NULL,
  entrance TEXT NOT NULL,
  latitude REAL NOT NULL,
  longitude REAL NOT NULL,
  street_name INTEGER NOT NULL REFERENCES street_names,
  postal_code INTEGER NOT NULL REFERENCES postal_codes,
  city INTEGER NOT NULL REFERENCES cities,
  municipality INTEGER NOT NULL REFERENCES municipalities
);
CREATE INDEX IF NOT EXISTS sa_hn_pc
  ON street_addresses (house_number, postal_code);
CREATE INDEX IF NOT EXISTS sa_hn_ci
  ON street_addresses (house_number, city);
CREATE UNIQUE INDEX IF NOT EXISTS sa_unique_address
  ON street_addresses (
    street_name, house_number, entrance, postal_code, city
  );

我在导入所有数据后运行这些命令:

PRAGMA journal_mode = OFF
PRAGMA page_size = 65536
VACUUM

使用街道名称和门牌号时解释查询计划:

sqlite> EXPLAIN QUERY PLAN SELECT sn.name, sa.house_number, sa.entrance, pc.postal_code, ci.name, mu.name, co.name, sa.latitude, sa.longitude FROM street_addresses AS sa INNER JOIN street_names   AS sn ON sa.street_name  = sn.id INNER JOIN postal_codes   AS pc ON sa.postal_code  = pc.id INNER JOIN cities         AS ci ON sa.city         = ci.id INNER JOIN municipalities AS mu ON sa.municipality = mu.id INNER JOIN counties       AS co ON mu.county       = co.id WHERE sn.name GLOB "FORNEBUVEIEN" AND sa.house_number=11 ORDER BY ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC LIMIT 0, 100;
selectid    order       from        detail                                                                   
----------  ----------  ----------  -------------------------------------------------------------------------
0           0           0           SEARCH TABLE street_addresses AS sa USING INDEX sa_hn_ci (house_number=?)
0           1           1           SEARCH TABLE street_names AS sn USING INTEGER PRIMARY KEY (rowid=?)      
0           2           2           SEARCH TABLE postal_codes AS pc USING INTEGER PRIMARY KEY (rowid=?)      
0           3           3           SEARCH TABLE cities AS ci USING INTEGER PRIMARY KEY (rowid=?)            
0           4           4           SEARCH TABLE municipalities AS mu USING INTEGER PRIMARY KEY (rowid=?)    
0           5           5           SEARCH TABLE counties AS co USING INTEGER PRIMARY KEY (rowid=?)          
0           0           0           USE TEMP B-TREE FOR ORDER BY

最佳答案

事实证明,在我的 SELECT 查询中使用这样的 WHERE 部分时:

WHERE
  sn.name GLOB ? AND
  sa.house_number = ?

SQLite3 选择索引 sa_hn_ci (house_number, city) 而不是 sa_unique_address。这使查询运行速度慢了大约 100 倍。

每当我的查询包含街道名称时,我现在通过使用 INDEXED BY 来解决这个问题:

SELECT
  sn.name, sa.house_number, sa.entrance, pc.postal_code,
  ci.name, mu.name, co.name, sa.latitude, sa.longitude
FROM
  street_addresses AS sa INDEXED BY sa_unique_address          -- This line!
  INNER JOIN street_names   AS sn ON sa.street_name  = sn.id
  INNER JOIN postal_codes   AS pc ON sa.postal_code  = pc.id
  INNER JOIN cities         AS ci ON sa.city         = ci.id
  INNER JOIN municipalities AS mu ON sa.municipality = mu.id
  INNER JOIN counties       AS co ON mu.county       = co.id
WHERE
  sn.name GLOB "FORNEBUVEIEN" AND
  sa.house_number=11
ORDER BY
  ci.name ASC, sn.name ASC, sa.house_number ASC, sa.entrance ASC
LIMIT
  0, 100;

但我不知道为什么 SQLite3 一开始就选择了错误的索引。运行 ANALYZE 没有任何改变。

我没有将此答案标记为正确。

关于sql - 为什么更改我的表模式会减慢我的查询速度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23657721/

相关文章:

mysql - SQL 选择投资组合中的持股集中度

sql - 如何使用 COLLATE NOCASE 按不区分大小写的字母顺序排序

c# - 将 SQLite 数据库直接添加到 Zip 文件

performance - 仅使用 nginx 服务器,EC2 微型实例应该能够处理多少用户?

performance - Azure 存储的下载速度慢

java - 更新 sqlite 中的查询

php - 我的代码是否容易受到 SQL 注入(inject)攻击

SQL 服务器 : can't select columns of one table by filtering from another table on same database issue

php - 将 2 个选择查询连接到一个 php 变量中

performance - 在磁盘/流图分区算法上存储非常大的图?