sql - Postgres 不为慢函数使用索引

在我的数据库设计中，用到了很多函数。而且其中很多都非常慢。因此，我决定为其中一些创建索引以加快执行速度可能是一个明智的想法。但是，我没有成功说服 PostgreSQL (9.6) 实际使用我的索引。

考虑这个表“用户”

id integer | name jsonb
1          | {"last_names": ["Tester"], "first_names": ["Teddy","Eddy"]} 
2          | {"last_names": ["Miller"], "first_names": ["Lisa","Emma"]}

通常，我需要将名称作为一个字符串，这是通过(称为“concat_name”)这样的查询完成的

SELECT array_to_string(jsonb_arr2text_arr(name->'last_names'), ' ') || ', ' || array_to_string(jsonb_arr2text_arr(name->'first_names'), ' ');

我决定将该功能放入一个函数中，因为它用于多个表:

CREATE OR REPLACE FUNCTION public.concat_name(name jsonb)
  RETURNS text AS
$BODY$
  SELECT pg_sleep(50);
  SELECT array_to_string(jsonb_arr2text_arr(name->'last_names'), ' ') || ', ' || array_to_string(jsonb_arr2text_arr(name->'first_names'), ' ');
$BODY$
  LANGUAGE sql IMMUTABLE SECURITY DEFINER
  COST 100;

你看，为了实际测试它是否有效，我添加了一个“人为”超时。现在，我创建了一个索引，如:

CREATE INDEX user_concat_name_idx ON "user" (concat_name(name));

成功并花费了预期的时间(因为 pg_sleep)。然后我运行一个查询:

SELECT concat_name(name) FROM "user";

但是索引没有被使用，查询很慢。相反，EXPLAIN 告诉我刨床对“用户”进行序列扫描。

我做了一些研究，很多人说查询规划器认为如果表很小或者正在检索的数据集(几乎)是整个表，它认为进行序列扫描比查找更有效上一个索引。但是，对于函数，尤其是慢函数，这对我来说没有任何意义。即使您查询一个只包含一行的表 - 如果您的查询包含一个每次需要 50 秒执行的函数，使用函数索引也可以显着减少执行时间。

因此，在我看来，查询计划器必须比较查找索引值所花费的时间与执行函数所花费的时间。表的大小或查询本身的大小(返回多少行)在这里根本不重要。而且，好吧，如果函数需要 50 秒来执行，查找索引应该总是成功的。

那么，我可以在这里做些什么来让查询规划器使用索引而不是每次都重新执行函数？

最佳答案

首先，如果您想在仅选择 concat_name(name) 的查询中使用它，(id, concat_name(name)) 上的索引就没有意义.索引应该是:

create index user_concat_name_idx on "user" (concat_name(name));

其次，索引将在需要时使用，例如当您添加 order by concat_name(name) 时:

explain analyse
select concat_name(name)
from "user"
order by 1;

                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using user_concat_name_idx on "user"  (cost=0.42..29928.42 rows=100000 width=82) (actual time=0.157..1046.168 rows=100000 loops=1)
 Planning time: 0.753 ms
 Execution time: 1048.862 ms
(3 rows)

此外，您可以使您的函数更简单、更快速:

create or replace function concat_name(name jsonb)
returns text language sql immutable as $$
    select concat_ws(', ',
        (select string_agg(value, ' ')
        from jsonb_array_elements_text(name->'last_names')),
        (select string_agg(value, ' ')
        from jsonb_array_elements_text(name->'first_names'))
    )
$$;

What can I do here to make the query planer use the index instead of executing the function each time anew?

您应该声明一个更大的函数成本，例如:

create or replace function concat_name(name jsonb)
returns text language sql immutable as $$
-- ...
$$
cost 1000;

根据 the documentation :

execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

关于sql - Postgres 不为慢函数使用索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44023031/

sql - Postgres 不为慢函数使用索引

上一篇：sql - 在 Postgres 9.6 的 json 列中查询属性数组内的数据

下一篇：sql - 在 PostgreSQL 中混淆相同长度的名称