sql - PostgreSQL - 查询优化

标签 sql postgresql performance query-optimization

我有下面的查询,运行大约需要 15-20 秒。

with cte0 as (
    SELECT
        label,
        date,
        CASE
            WHEN
                Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL
            THEN
                '1'::numeric
            ELSE
                '0'::numeric
        END As "unique"
    FROM table_data
    LEFT JOIN table_mapper ON
        table_mapper."type" = table_data."type"
    WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1
)
SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1
UNION ALL
SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1
UNION ALL
SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1

在表 table_data 中,我在 date 列上有一个索引。

CREATE INDEX ix_cli_date
  ON table_data
  USING btree
  (date);

表定义(\d table_data)

Table "public.table_data"
      Column      |          Type          | Modifiers
------------------+------------------------+-----------
 date             | date                   | not null
 number           | bigint                 | not null
 time             | time without time zone | not null
 end time         | time without time zone | not null
 duration         | integer                | not null
 time1            | integer                | not null
 time2            | integer                | not null
 time3            | integer                | not null
 time4            | integer                | not null
 time5            | integer                | not null
 time6            | integer                | not null
 time7            | integer                | not null
 type             | text                   | not null
 name             | text                   | not null
 id1              | integer                | not null
 id2              | integer                | not null
 key              | integer                | not null
 status           | text                   | not null
Indexes:
    "ix_cli_date" btree (date)

表定义(\d table_mapper)

 Table "public.table_mapper"
   Column   | Type | Modifiers
------------+------+-----------
 type       | text | not null
 label     | text | not null
 label2     | text | not null
 label3     | text | not null
 label4     | text | not null
 label5     | text | not null

EXPLAIN ANALYZE 查询

Result  (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)"
  CTE cte0"
    ->  WindowAgg  (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)"
          ->  Sort  (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)"
                Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time""
                Sort Method: external merge  Disk: 39480kB"
                ->  Hash Left Join  (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)"
                      Hash Cond: (table_data."type" = table_mapper."type")"
                      ->  Index Scan using ix_cli_date on table_data  (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)"
                            Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date   Hash  (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)"
                            Buckets: 1024  Batches: 1  Memory Usage: 15kB"
                            ->  Seq Scan on table_mapper  (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)"
  ->  Append  (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)"
        ->  Aggregate  (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)"
              ->  CTE Scan on cte0  (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)"
                    Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
        ->  Aggregate  (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)"
              ->  CTE Scan on cte0  (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)"
                    Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
        ->  Aggregate  (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)"
              ->  CTE Scan on cte0  (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)"
                    Filter: (date = (('now'::text)::date - 1))"
Total runtime: 25710.506 ms"

描述:

我正在从 table_data 中获取唯一计数和重复计数,这就是 LEAD 帮助我解决的问题,我为 a 的最后一个重复值指定了值 0专栏。

假设我在一列中有 3 个 x。我将 1 值赋给前 2 个 x,第三个 x 赋值为 0。

实际上,通过 cte,我从表 table_data 中取出整行,并使用前导进行一些计算,并在定义的日期范围内连接字符串,其中每个行行 10 值根据标准定义。

如果线索为空,则计为 1,如果不为空,则计为 0。

然后我分别返回 3 行 MTDCurrent WeekFTD,并计算了 sum() 我从领导和 count(*) 整行中得到。

对于 MTD,我有当月的总和和计数。

对于周 - 这是当前周,FTD 是昨天。

最佳答案

WITH cte AS (
   SELECT d.thedate
        , lead(m.label) OVER (PARTITION BY m.label, d.thedate, d.number
                              ORDER BY d.thetime) AS leader
   FROM   table_data d
   LEFT   JOIN table_mapper m USING (type)
   WHERE  thedate BETWEEN date_trunc('month', current_date - 1)
                  AND current_date - 1
   )

SELECT 'MTD' AS label, round(count(leader)::numeric / count(*) * 100, 1) AS val
FROM   cte

UNION ALL
SELECT 'Week', round(count(leader)::numeric / count(*) * 100, 1)
FROM   cte
WHERE  thedate BETWEEN date_trunc('week', current_date - 1) AND current_date - 1

UNION ALL
SELECT 'FTD', round(count(leader)::numeric / count(*) * 100, 1)
FROM   cte
WHERE  thedate = current_date - 1;

CTE 对于大表很有意义,因此您只需扫描一次。对于较小的表,如果没有...,它可能会更快

使用 thedate 代替保留字 date(在标准 SQL 中)。 thetimeuni 而不是timeunique。等等

简化了 lead() 调用。您获得前导行的值或 NULL。这似乎是唯一相关的信息。
window functionORDER BY 子句中重复 PARTITION 子句中的列是毫无意义的浪费.

在此基础上,count(leader)/count(*)sum(uni)/count(uni) 更快一些。 count(column) 只计算非空值,而 count(*) 计算所有行。

UNION 查询的第一项条件是多余的。

在问题的评论中提供更多关于数据定义的建议和链接。

表设计/索引

你应该有主键。我建议 serialIDENTITY 列作为 table_data 的代理 PK:

ALTER TABLE table_data ADD COLUMN table_data_id serial PRIMARY KEY;

参见:

type 设为 table_mapper 的主键(以下 FK 约束也需要):

ALTER TABLE table_mapper ADD CONSTRAINT table_mapper_pkey (type);

type 添加外键约束以强制执行参照完整性。像这样的东西:

ALTER TABLE table_data ADD CONSTRAINT table_data_type_fkey
  FOREIGN KEY (type) REFERENCES table_mapper (type)
  ON UPDATE CASCADE ON DELETE NO ACTION;

为了最终的读取性能(写入需要付出一些代价),添加一个多列索引可能允许 index-only scans对于上述查询:

CREATE INDEX table_data_foo_idx ON table_data (thedate, number, thetime);

关于sql - PostgreSQL - 查询优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23339269/

相关文章:

json - 如何加入嵌套的 jsonb 数组元素?

ruby-on-rails - 类型为 GIN 或 GiST 的 Rails 4 pgsql add_index

Java isEmpty 或 "".equals 以获得性能

javascript - 切换到 JS Defer 转换 jquery ajax 重新加载整个页面

包裹在事务中时,Mysql DDL 查询卡在等待表元数据锁定

mysql - 需要一个 MySQL 查询来显示有 child 的 parent 以及没有 child 的 parent

c# - 从 LINQ 查询批量更新?

sql - 带有子字符串的棘手 SQL

python - 递归比较两个目录并标记等效结构

sql - 自定义排序顺序 - 如何不重复 Case 语句