我有一个 800MB 的 MS Access 数据库,已迁移到 SQLite。数据库结构如下(SQLite数据库迁移后约330MB):
表Occurrence
有 1,600,000 条记录。该表如下所示:
CREATE TABLE Occurrence
(
SimulationID INTEGER, SimRunID INTEGER, OccurrenceID INTEGER,
OccurrenceTypeID INTEGER, Period INTEGER, HasSucceeded BOOL,
PRIMARY KEY (SimulationID, SimRunID, OccurrenceID)
)
它有以下索引:
CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded" ASC)
CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID" ASC)
CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC)
CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID" ASC)
表OccurrenceParticipant
有 3,400,000 条记录。该表如下所示:
CREATE TABLE OccurrenceParticipant
(
SimulationID INTEGER, SimRunID INTEGER, OccurrenceID INTEGER,
RoleTypeID INTEGER, ParticipantID INTEGER
)
它有以下索引:
CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON "OccurrenceParticipant" ("OccurrenceID" ASC)
CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON "OccurrenceParticipant" ("ParticipantID" ASC)
CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant" ("RoleTypeID" ASC)
CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant" ("SimRunID" ASC)
CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON "OccurrenceParticipant" ("SimulationID" ASC)
表InitialParticipant
有 130 条记录。表的结构为
CREATE TABLE InitialParticipant
(
ParticipantID INTEGER PRIMARY KEY, ParticipantTypeID INTEGER,
ParticipantGroupID INTEGER
)
该表具有以下索引:
CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant" ("ParticipantGroupID" ASC)
CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant" ("ParticipantID" ASC)
表 ParticipantGroup
有 22 条记录。看起来像
CREATE TABLE ParticipantGroup (
ParticipantGroupID INTEGER, ParticipantGroupTypeID INTEGER,
Description varchar (50), PRIMARY KEY( ParticipantGroupID )
)
该表有以下索引: 在“ParticipantGroup”(“ParticipantGroupID”ASC)上创建索引“ParticipantGroup_ParticipantGroupID_idx”
表tmpSimArgs
有18条记录。它具有以下结构:
CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10))
以及以下索引:
CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC)
CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC)
表“tmpPartArgs”有 80 条记录。它具有以下结构:
CREATE TABLE tmpPartArgs(participantID INT)
以及以下索引:
CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC)
我有一个涉及多个 INNER JOIN 的查询,我面临的问题是该查询的 Access 版本大约需要一秒钟,而同一查询的 SQLite 版本需要 10 秒(大约慢 10 倍!)这是不可能的对我来说,迁移回 Access 和 SQLite 是我唯一的选择。
我是编写数据库查询的新手,因此这些查询可能看起来很愚蠢,所以请就您发现的任何错误或幼稚的内容提出建议。
Access中的查询是(整个查询执行需要1秒):
SELECT ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period, Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM
(
ParticipantGroup INNER JOIN InitialParticipant ON ParticipantGroup.ParticipantGroupID = InitialParticipant.ParticipantGroupID
) INNER JOIN
(
tmpPartArgs INNER JOIN
(
(
tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID = Occurrence.SimRunID) AND (tmpSimArgs.SimulationID = Occurrence.SimulationID)
) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID = OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID = OccurrenceParticipant.SimulationID)
) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID
) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period;
SQLite查询如下(该查询大约需要10秒):
SELECT ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period, Count(ij2.occpParticipantID) AS CountOfParticipantID FROM
(
SELECT ip.ParticipantGroupID AS ipParticipantGroupID, ip.ParticipantID AS ipParticipantID, ip.ParticipantTypeID, pg.ParticipantGroupID AS pgParticipantGroupID, pg.ParticipantGroupTypeID, pg.Description FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN
(
SELECT tpa.participantID AS tpaParticipantID, ij3.* FROM tmpPartArgs AS tpa INNER JOIN
(
SELECT ij4.*, occp.SimulationID as occpSimulationID, occp.SimRunID AS occpSimRunID, occp.OccurrenceID AS occpOccurrenceID, occp.ParticipantID AS occpParticipantID, occp.RoleTypeID FROM
(
SELECT tsa.SimulationID AS tsaSimulationID, tsa.SimRunID AS tsaSimRunID, occ.SimulationID AS occSimulationID, occ.SimRunID AS occSimRunID, occ.OccurrenceID AS occOccurrenceID, occ.OccurrenceTypeID, occ.Period, occ.HasSucceeded FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID = occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1 GROUP BY ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period;
我不知道我在这里做错了什么。我拥有所有索引,但我认为我缺少声明一些对我有用的关键索引。有趣的是,在迁移之前,我对 SQLite 的“研究”表明,SQLite 在各方面都比 Access 更快、更小、更好。但在查询方面,我似乎无法让 SQLite 比 Access 更快地工作。我重申,我是 SQLite 新手,显然没有太多想法和经验,因此如果任何博学的人可以帮助我解决这个问题,我将不胜感激。
最佳答案
我已经重新格式化了您的代码(使用我的自制 sql formatter ),希望能让其他人更容易阅读。
重新格式化的查询:
SELECT
ij1.Description,
ij2.occSimulationID,
ij2.occSimRunID,
ij2.Period,
Count(ij2.occpParticipantID) AS CountOfParticipantID
FROM (
SELECT
ip.ParticipantGroupID AS ipParticipantGroupID,
ip.ParticipantID AS ipParticipantID,
ip.ParticipantTypeID,
pg.ParticipantGroupID AS pgParticipantGroupID,
pg.ParticipantGroupTypeID,
pg.Description
FROM ParticipantGroup AS pg
INNER JOIN InitialParticipant AS ip
ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1
INNER JOIN (
SELECT
tpa.participantID AS tpaParticipantID,
ij3.*
FROM tmpPartArgs AS tpa
INNER JOIN (
SELECT
ij4.*,
occp.SimulationID AS occpSimulationID,
occp.SimRunID AS occpSimRunID,
occp.OccurrenceID AS occpOccurrenceID,
occp.ParticipantID AS occpParticipantID,
occp.RoleTypeID
FROM (
SELECT
tsa.SimulationID AS tsaSimulationID,
tsa.SimRunID AS tsaSimRunID,
occ.SimulationID AS occSimulationID,
occ.SimRunID AS occSimRunID,
occ.OccurrenceID AS occOccurrenceID,
occ.OccurrenceTypeID,
occ.Period,
occ.HasSucceeded
FROM tmpSimArgs AS tsa
INNER JOIN Occurrence AS occ
ON (tsa.SimRunID = occ.SimRunID)
AND (tsa.SimulationID = occ.SimulationID)
) AS ij4
INNER JOIN OccurrenceParticipant AS occp
ON (occOccurrenceID = occpOccurrenceID)
AND (occSimRunID = occpSimRunID)
AND (occSimulationID = occpSimulationID)
) AS ij3
ON tpa.participantID = ij3.occpParticipantID
) AS ij2
ON ij1.ipParticipantID = ij2.occpParticipantID
WHERE (
(
(ij2.RoleTypeID) = 52
OR
(ij2.RoleTypeID) = 49
)
)
AND ij2.HasSucceeded = 1
GROUP BY
ij1.Description,
ij2.occSimulationID,
ij2.occSimRunID,
ij2.Period;
根据 JohnFx(见上文),我对派生 View 感到困惑。我认为实际上没有必要,特别是因为它们都是内部联接。因此,下面我尝试降低复杂性。请检查并测试性能。我必须与 tmpSimArgs 进行交叉连接,因为它仅连接到 Occurence - 我认为这是所需的行为。
SELECT
pg.Description,
occ.SimulationID,
occ.SimRunID,
occ.Period,
COUNT(occp.ParticipantID) AS CountOfParticipantID
FROM ParticipantGroup AS pg
INNER JOIN InitialParticipant AS ip
ON pg.ParticipantGroupID = ip.ParticipantGroupID
CROSS JOIN tmpSimArgs AS tsa
INNER JOIN Occurrence AS occ
ON tsa.SimRunID = occ.SimRunID
AND tsa.SimulationID = occ.SimulationID
INNER JOIN OccurrenceParticipant AS occp
ON occ.OccurrenceID = occp.OccurrenceID
AND occ.SimRunID = occp.SimRunID
AND occ.SimulationID = occp.SimulationID
INNER JOIN tmpPartArgs AS tpa
ON tpa.participantID = occp.ParticipantID
WHERE occ.HasSucceeded = 1
AND (occp.RoleTypeID = 52 OR occp.RoleTypeID = 49 )
GROUP BY
pg.Description,
occ.SimulationID,
occ.SimRunID,
occ.Period;
关于SQLite 查询的运行速度比 MS Access 查询慢 10 倍,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2017114/