文章目錄
之前寫過一篇關于 PostgreSQL prepare sql的文章,但當時沒有提到generic plan(通用計劃)和custom plan(自定義計劃)這兩個概念。現在將通過舉例介紹這兩個概念。
創建測試表:
postgres=# create database demo;
CREATE DATABASE
postgres=# \c demo
You are now connected to database "demo" as user "postgres".
demo=# create table t1(a int,b text);
CREATE TABLE
demo=# insert into t1 select i,'aaa' from generate_series(1,100) i;
INSERT 0 100
demo=# insert into t1 select i,'bbb' from generate_series(101,200) i;
INSERT 0 100
demo=# insert into t1 select i,'ccc' from generate_series(201,300) i;
INSERT 0 100
在有了數據之后,可以創建一個預處理語句(prepared statement),這個語句的結構是固定的,但允許在執行時插入不同的參數值。這樣可以實現代碼重用,并提高查詢的執行效率。
demo=# prepare pre_stmt as select * from t1 where b=$1;
PREPARE
在 PostgreSQL 中,預處理語句(prepared statements)會在當前會話中注冊。如果想查看當前會話中有哪些預處理語句已經被創建并可用,可以通過查詢系統視圖 pg_prepared_statements 來獲取這些信息。
name | statement | prepare_time | parameter_types | result_types | from_sql | generic_plans | custom_plans
----------+--------------------------------------------------+-------------------------------+-----------------+----------------+----------+---------------+--------------pre_stmt | prepare pre_stmt as select * from t1 where b=$1; | 2025-01-01 18:32:43.697846+07 | {text} | {integer,text} | t | 0 | 0
(1 row)
當我們對一個語句運行 EXPLAIN (ANALYZE) 時,可以看到 PostgreSQL 為該語句生成的實際執行計劃以及執行的統計信息。這包括查詢是如何被優化和執行的,例如是否使用了索引、查詢的執行時間、返回的行數等。
demo=# explain (analyze) execute pre_stmt ('aaa');QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.032..0.252 rows=100 loops=1)Filter: (b = 'aaa'::text)Rows Removed by Filter: 200Planning Time: 0.568 msExecution Time: 0.287 ms
(5 rows)
在執行計劃的“Filter”行中,第一次執行時,執行計劃會顯示你傳入的實際參數值(比如 ‘aaa’),這時是 自定義計劃(custom plan)。但是,當你多次執行這個預處理語句時,執行計劃中filter行的參數值會變成顯示占位符(如 $1),這時是 通用計劃(generic plan)。
demo=# explain (analyze) execute pre_stmt ('aaa');QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.107..0.255 rows=100 loops=1)Filter: (b = 'aaa'::text)Rows Removed by Filter: 200Planning Time: 0.106 msExecution Time: 0.308 ms
(5 rows)demo=# explain (analyze) execute pre_stmt ('aaa');QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.255 rows=100 loops=1)Filter: (b = 'aaa'::text)Rows Removed by Filter: 200Planning Time: 0.144 msExecution Time: 0.297 ms
(5 rows)demo=# explain (analyze) execute pre_stmt ('aaa');QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.049..0.259 rows=100 loops=1)Filter: (b = 'aaa'::text)Rows Removed by Filter: 200Planning Time: 0.293 msExecution Time: 0.309 ms
(5 rows)demo=# explain (analyze) execute pre_stmt ('aaa');QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.041..0.248 rows=100 loops=1)Filter: (b = 'aaa'::text)Rows Removed by Filter: 200Planning Time: 0.279 msExecution Time: 0.289 ms
(5 rows)demo=# explain (analyze) execute pre_stmt ('aaa');QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.034..0.215 rows=100 loops=1)Filter: (b = $1)Rows Removed by Filter: 200Planning Time: 0.101 msExecution Time: 0.245 ms
(5 rows)
在執行查詢時,最開始會看到執行計劃中顯示實際傳入的參數值,但隨著執行次數的增加,執行計劃會變成使用占位符(例如 $1)來表示參數。此時,查詢使用的是通用計劃。通用計劃一旦生成后,無論傳入什么不同的參數值,它的執行計劃都不會再發生變化,直到該預處理語句結束。
demo=# explain (analyze) execute pre_stmt ( 'bbb' );QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.367 rows=100 loops=1)Filter: (b = $1)Rows Removed by Filter: 200Planning Time: 0.181 msExecution Time: 0.412 ms
(5 rows)demo=# explain (analyze) execute pre_stmt ( 'ccc' );QUERY PLAN
------------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.165..0.249 rows=100 loops=1)Filter: (b = $1)Rows Removed by Filter: 200Planning Time: 0.025 msExecution Time: 0.289 ms
(5 rows)demo=# explain (analyze) execute pre_stmt ( null );QUERY PLAN
----------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.75 rows=100 width=8) (actual time=0.232..0.233 rows=0 loops=1)Filter: (b = $1)Rows Removed by Filter: 300Planning Time: 0.022 msExecution Time: 0.255 ms
(5 rows)
如果你查看 PostgreSQL 的源代碼(特別是 src/backend/utils/cache/plancache.c 文件),你將能理解為什么在執行 5 次之后,執行計劃會發生變化。
這個文件包含了與查詢計劃緩存相關的邏輯,解釋了為什么預處理語句在執行多次后會從“自定義計劃”變為“通用計劃”。
/** choose_custom_plan: choose whether to use custom or generic plan** This defines the policy followed by GetCachedPlan.*/
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{double avg_custom_cost;/* One-shot plans will always be considered custom */if (plansource->is_oneshot)return true;/* Otherwise, never any point in a custom plan if there's no parameters */if (boundParams == NULL)return false;/* ... nor for transaction control statements */if (IsTransactionStmtPlan(plansource))return false;/* See if caller wants to force the decision */if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)return false;if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)return true;/* Generate custom plans until we have done at least 5 (arbitrary) */if (plansource->num_custom_plans < 5)return true;
一旦 PostgreSQL 使用了通用計劃,即使數據改變并重新分析表,執行計劃也不會再改變。
demo=# insert into t1 select i,'ddd' from generate_series(201,210) i;
INSERT 0 10
demo=# insert into t1 select i,'eee' from generate_series(211,211) i;
INSERT 0 1
demo=# analyze t1;
ANALYZE
demo=# select b,count(*) from t1 group by b order by b;b | count
-----+-------aaa | 100bbb | 100ccc | 100ddd | 10eee | 1
(5 rows)
demo=# explain (analyze) execute pre_stmt('ddd');QUERY PLAN
----------------------------------------------------------------------------------------------Seq Scan on t1 (cost=0.00..5.89 rows=62 width=8) (actual time=0.059..0.070 rows=10 loops=1)Filter: (b = $1)Rows Removed by Filter: 301Planning Time: 0.021 msExecution Time: 0.184 ms
(5 rows)
當數據量增加、數據分布不均勻,并且在某一列(比如“b”列)上建立了索引時,查詢的執行計劃可能會發生變化,PostgreSQL 會根據這些因素選擇不同的執行策略。
demo=# insert into t1 select i, 'aaa' from generate_series (1,2000000) i;
INSERT 0 1000000
demo=# insert into t1 select i, 'bbb' from generate_series (1000001,3000000) i;
INSERT 0 1000000
demo=# insert into t1 select i, 'ccc' from generate_series (2000001,3000000) i;
INSERT 0 1000000
demo=# insert into t1 select i, 'eee' from generate_series (3000001,3000010) i;
INSERT 0 10
demo=# create index idx_b on t1(b);
CREATE INDEX
demo=# select b,count(*) from t1 group by b order by b;b | count
-----+---------aaa | 1000000bbb | 1000000ccc | 1000000eee | 10(4 rows)
無論我們執行這個查詢多少次(查詢的是參數‘eee’),PostgreSQL 都不會使用通用計劃,而是會持續使用自定義計劃。
demo=# explain (analyze) execute pre_stmt('eee');QUERY PLAN
-----------------------------------------------------------------------------------------------------------Index Scan using idx_b on t1 (cost=0.43..4.45 rows=1 width=8) (actual time=0.039..0.048 rows=10 loops=1)Index Cond: (b = 'eee'::text)Planning Time: 0.287 msExecution Time: 0.076 ms
(4 rows)
------>重復執行多次,但至少執行 10 次。
demo=# explain (analyze) execute pre_stmt('eee');QUERY PLAN
-----------------------------------------------------------------------------------------------------------Index Scan using idx_b on t1 (cost=0.43..4.45 rows=1 width=8) (actual time=0.037..0.045 rows=10 loops=1)Index Cond: (b = 'eee'::text)Planning Time: 0.137 msExecution Time: 0.066 ms
(4 rows)
在某些情況下,當數據分布不均勻時(比如某列中有很多重復值,只有少數值是稀有的,比如上面例子中出現的b=‘eee’),即使考慮到重新生成執行計劃的開銷,自定義計劃的執行成本仍然比通用計劃低。這樣,PostgreSQL 會優先選擇使用自定義計劃,而不會使用通用計劃,甚至在重新規劃的開銷考慮進去之后,通用計劃也可能永遠不會被使用。
順便提一下啊,在postgresql后,explain增加了generic_plan選型來供我們方便的查看通用執行計劃
demo=# EXPLAIN (GENERIC_PLAN) SELECT * FROM t1 WHERE t1 = $1;QUERY PLAN
------------------------------------------------------------------------Gather (cost=1000.00..31400.05 rows=15000 width=8)Workers Planned: 2-> Parallel Seq Scan on t1 (cost=0.00..28900.05 rows=6250 width=8)Filter: (t1.* = $1)
(4 rows)
總的來說,PostgreSQL 15 引入了 EXPLAIN (GENERIC_PLAN) 選項,使得開發者可以明確地查看通用計劃,而不受參數變化的影響。
如果覺得文章有點價值,請幫忙點關注,謝謝