PostgreSQL 内联函数行为

我想知道是否有人可以确认 PostgreSQL 内联的预期行为？

在 Microsoft SQL 世界中，任何被确定为内联的函数都将在多行的单次迭代中使用(函数体本质上被注入(inject)到调用者语句中，使其基于集合 [One call ] 而不是对每个输入数据行进行探测 [许多调用])。

我和我的团队在没有 Profiler 的情况下努力证明这一点，例如在 MSSQL 中，但我们最终证明了这一点，并发现我们认为内联的函数的迭代次数与行数成正比它在其上运行。

我们通过在函数 (pg_sleep) 中引入一个有意的等待来做到这一点，我们可以看到等待 N 秒导致总执行时间为 Rows*N，即在 6 行的输入上，等待 1 秒是6秒，等待2是12等等。

所以我们的问题是:

PostgreSQL 中的内联是否如我们所想(相当于 MSSQL 的内联函数 [Type='IF'])？
是否有像 MSSQL 中的 Profiler 那样能够清楚地显示这一点的分析工具？
是否有任何我们可以查看的元数据标记来确认/否认我们的函数确实是可内联的？

最佳答案

术语“内联”在 Postgres 中有不同的含义。这通常指的是 language sql 函数，当在另一个查询中使用时，这些函数会被包含的查询完全替换，例如这个功能

create or replace function customers_in_zip(p_zip_code varchar(5))
  returns setof customers
as
$$
  select *
  from customers
  where zip_code = p_zip_code;
$$
language sql;

这样使用:

select *
from orders o 
  join customers_in_zip('42') c on o.customer_id = c.id;

将被优化器扩展为:

select *
from orders o 
  join customers c on o.customer_id = c.id and c.zip_code = '42';

在使用 explain (analyze) 生成执行计划时可以看到这种类型的内联。为此，必须将函数标记为 immutable 或 stable

例如如果函数可以“内联”，则计划如下所示:

Nested Loop  (cost=2.39..200.79 rows=79 width=52) (actual time=0.021..0.165 rows=115 loops=1)
  ->  Bitmap Heap Scan on public.customers  (cost=1.97..20.71 rows=13 width=28) (actual time=0.014..0.023 rows=15 loops=1)
        Recheck Cond: ((customers.zip_code)::text = '80807'::text)
        ->  Bitmap Index Scan on customers_zip_code_idx  (cost=0.00..1.96 rows=13 width=0) (actual time=0.010..0.010 rows=15 loops=1)
              Index Cond: ((customers.zip_code)::text = '80807'::text)
  ->  Index Scan using idx_orders_cust_id on public.orders o  (cost=0.42..13.84 rows=8 width=24) (actual time=0.003..0.008 rows=8 loops=15)
        Index Cond: (o.customer_id = customers.id)

如您所见，没有对函数的引用(没有函数的查询计划看起来几乎相同)。

如果函数不是内联的(例如因为它没有被声明为 stable 或者因为它是一个 PL/pgSQL 函数而不是 SQL 函数)，计划看起来像这样:

Nested Loop  (cost=0.68..139.94 rows=77 width=110) (actual time=0.710..0.862 rows=115 loops=1)
  ->  Function Scan on public.customers_in_zip c  (cost=0.25..0.26 rows=10 width=86) (actual time=0.696..0.697 rows=15 loops=1)
        Function Call: customers_in_zip('42'::character varying)
        Buffers: shared hit=18
  ->  Index Scan using idx_orders_cust_id on public.orders o  (cost=0.42..13.96 rows=8 width=24) (actual time=0.004..0.009 rows=8 loops=15)
        Output: o.id, o.customer_id, o.order_date, o.amount, o.sales_person_id
        Index Cond: (o.customer_id = c.id)

From your description, it seems you are not referring to that kind of "inlining", but rather whether a scalar function is only called once if it does not depend on values take from the row, e.g.:

select col1, some_function(), col2
from some_table;

如果 some_function() 被声明为 immutable 它只会被调用一次。

Quote from the manual

IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; [...] If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.

这不是你可以直接在执行计划中看到的东西，但下面将演示它:

create function expensive_scalar(p_some_input integer)
  returns integer
as 
$$
begin
  perform pg_sleep(10);
  return p_some_input * 2;
end;  
$$
language plpgsql
<b>IMMUTABLE</b>;

perform pg_sleep(10); 使函数需要 10 秒才能执行。以下查询将调用该函数一百次:

select i, expensive_scalar(2)
from generate_series(1,100) i;

但是执行时间只有10秒多一点，很明显这个函数只调用了一次。

据我所知，Postgres 还会在执行单个语句期间缓存标记为 stable 的函数的结果(对于相同的输入值)。

虽然这有点难以证明。通常，您可以通过将 raise notice 语句(Postgres 等同于 print)放入函数中并查看它们的打印频率来做到这一点。

关于PostgreSQL 内联函数行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53040170/

PostgreSQL 内联函数行为

上一篇：sql - 在PostgreSQL的同一查询中如何使用两个SUM()聚合函数？

下一篇：sql - postgresql - 使用聚合和最大值