相關
《Postgresql源碼(133)優化器動態規劃生成連接路徑的實例分析》
1 問題
最近遇到一個問題,評估行數和真實行數存在較大差距,導致計劃不準的問題。
nestloop內表評估是根據外表的參數來的。因為外表驅動表每取一條,內表才能做查詢。所以這里外表的一條數據對內表來說就是參數。內表使用參數化路徑來評估行數。
本篇針對這個實例對參數化路徑行數評估做一些分析。
postgres=# explain analyze select * from iii, mmm where iii.poid = mmm.poid;QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.43..125.97 rows=17 width=27) (actual time=0.063..6.031 rows=1000 loops=1)-> Seq Scan on mmm (cost=0.00..1.10 rows=10 width=13) (actual time=0.005..0.010 rows=10 loops=1)-> Index Scan using idx_iii_poid on iii (cost=0.43..12.47 rows=2 width=14) (actual time=0.022..0.563 rows=100 loops=10)Index Cond: (poid = mmm.poid)Planning Time: 0.351 msExecution Time: 6.138 ms
(6 rows)
2 全文總結(便于查詢)
文章結論
- get_parameterized_baserel_size計算參數化路徑的選擇率,本例中并沒有看到任何特殊的評估方法,完全是按照poid的選擇率來評估的。
clauselist_selectivity = 1.7108151058814915e-07
rel->tuples = 10000001nrows = rel->tuples * clauselist_selectivity = 2
- pg_statistic記錄的poid列唯一值個數占比
-0.58516644
- pg_class記錄行數為
10000001
條,所以評估poid列的唯一值個數為10000001*0.58516644
=5845167
(實際9901001
)。這里有一半的偏差,但不是主要原因。 2行
=10000001*1.71e-07
,在一個均勻分布的數據集中,每一個唯一值出現的概率是1.71e-07。所以如果有1千萬行數據,隨便給一個poid,能選出來兩行不一樣的。- 如果唯一值非常少,那么選擇率會變大,趨近于1,則
10000001行
=10000001*1
。隨便給一個poid,能選出來會非常多。 - 如果唯一值非常多,那么選擇率會變小,趨近于0,則
0行
=10000001*0
。隨便給一個poid,能選出來會非常少。
- 如果唯一值非常少,那么選擇率會變大,趨近于1,則
- 評估兩行但實際是100行的原因是,數據不均勻但統計信息平均采樣,導致pg_statistic中stadistinct不準,進一步導致選擇率不準,導致計算出現偏差。
分析思路整理
- standard_join_search逐層規劃連接順序,找到參數化path節點。
1.1.1 PATH 1:用索引參數化路徑,評估2行
- create_index_path在make_one_rel→set_base_rel_pathlists中一共調用6次,其中一次是參數化路徑,特點是入參required_outer有值,可能拿到外表信息。
- 進一步分析create_index_path參數化成本計算,create_index_path→get_baserel_parampathinfo函數生成ParamPathInfo,ParamPathInfo中ppi_rows記錄估算函數。
- 進一步分析ppi_rows的計算,在create_index_path→get_baserel_parampathinfo中處理,get_baserel_parampathinfo用行數1千萬乘以選擇率得到行數。
- 進一步分析選擇率的計算
- clauselist_selectivity → clause_selectivity_ext → restriction_selectivity → eqsel_internal → var_eq_non_const
- 使用
stadistinct = stats->stadistinct = -0.58516644
(來自pg_statistic表) - 使用
ntuples = vardata->rel->tuples = 10000001
(來自relation結構,來自pg_class表) - 計算唯一行數:
clamp_row_est(-stadistinct * ntuples = 5845167.024516644) = 5845167
- 進而得到選擇率selec = selec / ndistinct = 1 / 5845167 = 1.7108151058814915e-07
實例
-
計劃二:優化器認為驅動表每一行,內表有2行能連接上,實際有100行能連接上。為什么差距大?
-
(cost=0.43..12.47 rows=2 width=14) (actual time=0.022..0.563 rows=100 loops=10)
評估不準的原因是什么?
-- 計劃二
drop table iii;
CREATE TABLE iii (poid INT NOT NULL, value NUMERIC, status int);
-- 可以和mmm表連上
INSERT INTO iii SELECT t%1000, t, 0 FROM generate_series(1, 100000) t order by random();
-- 干擾數據,占比高擔都和mmm表連不上
INSERT INTO iii SELECT t, t, 0 FROM generate_series(100000, 10000000) t order by random();
CREATE INDEX idx_iii_poid ON iii(poid);
analyze iii;drop table mmm;
CREATE TABLE mmm (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO mmm SELECT t, t, 0 FROM generate_series(1, 10) t order by random();
CREATE INDEX idx_mmm_poid ON mmm(poid);
analyze mmm;set enable_hashjoin to off;
set enable_mergejoin to off;
set enable_bitmapscan to off;
explain analyze select * from iii, mmm where iii.poid = mmm.poid;postgres=# explain analyze select * from iii, mmm where iii.poid = mmm.poid;QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.43..125.97 rows=17 width=27) (actual time=0.063..6.031 rows=1000 loops=1)-> Seq Scan on mmm (cost=0.00..1.10 rows=10 width=13) (actual time=0.005..0.010 rows=10 loops=1)-> Index Scan using idx_iii_poid on iii (cost=0.43..12.47 rows=2 width=14) (actual time=0.022..0.563 rows=100 loops=10)Index Cond: (poid = mmm.poid)Planning Time: 0.351 msExecution Time: 6.138 ms
(6 rows)
要分析這個問題需要從path生成開始看:
path生成
1 standard_join_search第一層
1.1 第一層第一個RelOptInfo(iii表)
p ((RelOptInfo*)root->join_rel_level[1].elements[0].ptr_value).pathlist
1.1.1 PATH 1:用索引參數化路徑,評估2行,代價total_cost = 12.466928526553348
$53 = {path = {type = T_IndexPath,pathtype = T_IndexScan,parent = 0x1e91558,pathtarget = 0x1e697c8,param_info = 0x1e8d340,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 2,startup_cost = 0.435,total_cost = 12.466928526553348,pathkeys = 0x1e8cf90},indexinfo = 0x1e91768,indexclauses = 0x1e8cef0,indexorderbys = 0x0,indexorderbycols = 0x0,indexscandir = ForwardScanDirection,indextotalcost = 4.4499999999999993,indexselectivity = 1.7108151058814915e-07
}
param_info = 0x1e8d340 記錄了什么?
1.1.2 PATH 2:全表掃,代價total_cost = 154055.01000000001
(gdb) tr Path 0x1e8bbc0
$55 = {type = T_Path,pathtype = T_SeqScan,parent = 0x1e91558,pathtarget = 0x1e697c8,param_info = 0x0,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 10000001,startup_cost = 0,total_cost = 154055.01000000001,pathkeys = 0x0
}
1.1.3 PATH 3:用索引全表掃iii表,評估10000001行。代價total_cost = 475019.93093073595
$57 = {path = {type = T_IndexPath,pathtype = T_IndexScan,parent = 0x1e91558,pathtarget = 0x1e697c8,param_info = 0x0,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 10000001,startup_cost = 0.435,total_cost = 475019.93093073595,pathkeys = 0x1e8c7d0},indexinfo = 0x1e91768,indexclauses = 0x0,indexorderbys = 0x0,indexorderbycols = 0x0,indexscandir = ForwardScanDirection,indextotalcost = 158924.44,indexselectivity = 1
}
1.2 第一層第二個RelOptInfo(mmm表)
p ((RelOptInfo*)root->join_rel_level[1].elements[1].ptr_value).pathlist
1.2.1 PATH 1:用索引參數化路徑,評估1行,代價total_cost = 0.15250119999987999
$70 = {path = {type = T_IndexPath,pathtype = T_IndexScan,parent = 0x1e90d28,pathtarget = 0x1e698f8,param_info = 0x1eac2d8,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 1,startup_cost = 0.13500000000000001,total_cost = 0.15250119999987999,pathkeys = 0x1eabfd8},indexinfo = 0x1e8b8c8,indexclauses = 0x1eabf38,indexorderbys = 0x0,indexorderbycols = 0x0,indexscandir = ForwardScanDirection,indextotalcost = 0.14250079999991999,indexselectivity = 0.10000000000000001
}
1.2.2 PATH 2:全表掃,評估10行,代價total_cost = 1.1000000000000001
$71 = {type = T_Path,pathtype = T_SeqScan,parent = 0x1e90d28,pathtarget = 0x1e698f8,param_info = 0x0,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 10,startup_cost = 0,total_cost = 1.1000000000000001,pathkeys = 0x0
}
1.2.3 PATH 3:索引全表掃,評估10行,代價otal_cost = 12.285
{path = {type = T_IndexPath,pathtype = T_IndexScan,parent = 0x1e90d28,pathtarget = 0x1e698f8,param_info = 0x0,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 10,startup_cost = 0.13500000000000001,total_cost = 12.285,pathkeys = 0x1eab998},indexinfo = 0x1e8b8c8,indexclauses = 0x0,indexorderbys = 0x0,indexorderbycols = 0x0,indexscandir = ForwardScanDirection,indextotalcost = 8.1850000000000005,indexselectivity = 1
}
2 standard_join_search第二層,只有一個RelOptInfo
p ((RelOptInfo*)root->join_rel_level[2].elements[0].ptr_value).pathlist
只有一個PATH
$91 = {jpath = {path = {type = T_NestPath,pathtype = T_NestLoop,parent = 0x1eacc28,pathtarget = 0x1eace58,param_info = 0x0,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 17,startup_cost = 0.435,total_cost = 125.96928526553347,pathkeys = 0x0},jointype = JOIN_INNER,inner_unique = false,outerjoinpath = 0x1e8d780, innerjoinpath = 0x1e8d020,joinrestrictinfo = 0x0}
}
- outerjoinpath = 0x1e8d780:選擇了1.2.2 PATH 2:全表掃,評估10行,代價total_cost = 1.1000000000000001。
- innerjoinpath = 0x1e8d020:選擇了1.1.1 PATH 1:用索引參數化路徑,評估2行,代價total_cost = 12.466928526553348。
為什么iii表的索引參數化路徑評估只有2行?
- 現階段只有nestloop的內表需要參數化路徑,因為內表評估行數時,無法確切知道能連上多少行,所以只能計算出一個行數。
- 在上文
1.1.1 PATH 1:用索引參數化路徑,評估2行,代價total_cost = 12.466928526553348
中,參數化node評估的行數是怎么計算出來的?
$53 = {path = {type = T_IndexPath,pathtype = T_IndexScan,parent = 0x1e91558,pathtarget = 0x1e697c8,param_info = 0x1e8d340,parallel_aware = false,parallel_safe = true,parallel_workers = 0,rows = 2,startup_cost = 0.435,total_cost = 12.466928526553348,pathkeys = 0x1e8cf90},indexinfo = 0x1e91768,indexclauses = 0x1e8cef0,indexorderbys = 0x0,indexorderbycols = 0x0,indexscandir = ForwardScanDirection,indextotalcost = 4.4499999999999993,indexselectivity = 1.7108151058814915e-07
}
create_index_path在make_one_rel→set_base_rel_pathlists中一共調用6次:
-- iii表的索引 idx_iii_poid,非參數化路徑
create_index_path (root=0x1e68cb8, index=0x1e91768, indexclauses=0x0, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1e8c7d0, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x0, loop_count=1, partial_path=false)
-- iii表的索引 idx_iii_poid,非參數化路徑
create_index_path (root=0x1e68cb8, index=0x1e91768, indexclauses=0x0, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1e8c7d0, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x0, loop_count=1, partial_path=true)
-- iii表的索引 idx_iii_poid,required_outer有值,計算參數化路徑
create_index_path (root=0x1e68cb8, index=0x1e91768, indexclauses=0x1e8cef0, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1e8cf90, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x1e8cca0, loop_count=10, partial_path=false)-- mmm表的索引 idx_mmm_poid
create_index_path (root=0x1e68cb8, index=0x1e8b8c8, indexclauses=0x0, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1eab998, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x0, loop_count=1, partial_path=false)
create_index_path (root=0x1e68cb8, index=0x1e8b8c8, indexclauses=0x0, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1eab998, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x0, loop_count=1, partial_path=true)
create_index_path (root=0x1e68cb8, index=0x1e8b8c8, indexclauses=0x1eabf38, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1eabfd8, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x1e8d2d0, loop_count=10000001, partial_path=false)
- required_outer有值,表示參數化的indexpath:
-- iii表的索引 idx_iii_poid,required_outer有值,計算參數化路徑
create_index_path (root=0x1e68cb8, index=0x1e91768, indexclauses=0x1e8cef0, indexorderbys=0x0, indexorderbycols=0x0, pathkeys=0x1e8cf90, indexscandir=ForwardScanDirection, indexonly=false, required_outer=0x1e8cca0, loop_count=10, partial_path=false)
參數化路徑計算
create_index_path→get_baserel_parampathinfo函數生成ParamPathInfo
get_baserel_parampathinfo調用get_parameterized_baserel_size評估,當前節點是幾行。例如評估出來2行。explain時內表看到的rows就是2行,含義是外表每一行過來,內表有兩行能連上。
get_parameterized_baserel_size
double
get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,List *param_clauses)
{List *allclauses;double nrows;allclauses = list_concat_copy(param_clauses, rel->baserestrictinfo);nrows = rel->tuples *clauselist_selectivity(root,allclauses,rel->relid, /* do not use 0! */JOIN_INNER,NULL);nrows = clamp_row_est(nrows);/* For safety, make sure result is not more than the base estimate */if (nrows > rel->rows)nrows = rel->rows;return nrows;
}
- allclauses合并param_clauses參數化條件和rel->baserestrictinfo當前表自身的約束條件,得到綜合條件列表??。
- 使用 clauselist_selectivity 函數計算這些條件的綜合選擇率,再乘以表的基數rel->tuples,最終得到參數化路徑下的預估行數。
當前的allclauses表本身沒有條件,只有外表提供的參數化條件:
-
上圖中opexpr是等號,指向varno=1、varno=2指向兩個基表。
-
nrows = rel->tuples * clauselist_selectivity = 2
- clauselist_selectivity = 1.7108151058814915e-07
- rel->tuples = 10000001
clauselist_selectivity → clause_selectivity_ext的計算邏輯分支較多,這里只給出本例走到的分支:
clause_selectivity_ext......clause = (Node *) rinfo->clause;......else if (is_opclause(clause) || IsA(clause, DistinctExpr))......restriction_selectivity...
restriction_selectivity用于計算條件(where中過濾條件)的選擇率:
Selectivity
restriction_selectivity(PlannerInfo *root,Oid operatorid,List *args,Oid inputcollid,int varRelid)
{
- 這里operatorid=96是等號。
- get_oprrest從系統表中拿到oprrest 字段,字段指向一個注冊的選擇率計算函數oid=101,eqsel函數。
RegProcedure oprrest = get_oprrest(operatorid);float8 result;/** if the oprrest procedure is missing for whatever reason, use a* selectivity of 0.5*/if (!oprrest)return (Selectivity) 0.5;result = DatumGetFloat8(OidFunctionCall4Coll(oprrest,inputcollid,PointerGetDatum(root),ObjectIdGetDatum(operatorid),PointerGetDatum(args),Int32GetDatum(varRelid)));if (result < 0.0 || result > 1.0)elog(ERROR, "invalid restriction selectivity: %f", result);return (Selectivity) result;
}
進入eqsel函數開始計算選擇率,eqsel用于計算等值和??不等值操作符的選擇率,是優化器代價模型的核心組成部分。
- root:優化器上下文(包含統計信息、表元數據等)
- operator:操作符的 OID(如 = 或 <>)
- args:操作符的參數列表(如 a = 5 中的列 a 和常量 5)
- varRelid:關聯的關系 ID(用于確定統計信息范圍)
- collation:排序規則(影響字符串比較)
/** Common code for eqsel() and neqsel()*/
static double
eqsel_internal(PG_FUNCTION_ARGS, bool negate)
{PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);Oid operator = PG_GETARG_OID(1);List *args = (List *) PG_GETARG_POINTER(2);int varRelid = PG_GETARG_INT32(3);Oid collation = PG_GET_COLLATION();VariableStatData vardata;Node *other;bool varonleft;double selec;
negate表示不等值。
if (negate){operator = get_negator(operator);if (!OidIsValid(operator)){/* Use default selectivity (should we raise an error instead?) */return 1.0 - DEFAULT_EQ_SEL;}}
本例中的args:
- Var = {xpr = {type = T_Var}, varno = 1, varattno = 1, vartype = 23}
- Var = {xpr = {type = T_Var}, varno = 2, varattno = 1, vartype = 23}
- varRelid = 1
/** If expression is not variable = something or something = variable, then* punt and return a default estimate.*/if (!get_restriction_variable(root, args, varRelid,&vardata, &other, &varonleft))return negate ? (1.0 - DEFAULT_EQ_SEL) : DEFAULT_EQ_SEL;
get_restriction_variable將條件拆分為變量和??常量或其他表達式??。
- vardata
- {var = 0x1e6b2c0, rel = 0x1e8b8f8, statsTuple = 0x7f4874e75e08, freefunc = 0xb73cd6 , vartype = 23, atttype = 23, atttypmod = -1, isunique = false, acl_ok = true}
- var = {xpr = {type = T_Var}, varno = 1, varattno = 1, vartype = 23}
- statsTuple = pg_stat_statistic對應的行。
- {var = 0x1e6b2c0, rel = 0x1e8b8f8, statsTuple = 0x7f4874e75e08, freefunc = 0xb73cd6 , vartype = 23, atttype = 23, atttypmod = -1, isunique = false, acl_ok = true}
- other = {{xpr = {type = T_Var}, varno = 2, varattno = 1, vartype = 2}
- varonleft = true
開始計算選擇率 var_eq_non_const
/** We can do a lot better if the something is a constant. (Note: the* Const might result from estimation rather than being a simple constant* in the query.)*/if (IsA(other, Const))selec = var_eq_const(&vardata, operator, collation,((Const *) other)->constvalue,((Const *) other)->constisnull,varonleft, negate);elseselec = var_eq_non_const(&vardata, operator, collation, other,varonleft, negate);ReleaseVariableStats(vardata);return selec;
}
var_eq_non_const用于計算變量與非常量表達式(如其他列或子查詢結果)的等值條件(= 或 <>)的選擇率??,例如 a = b 或 x <> y。核心邏輯是基于統計信息(如唯一性約束、NULL值比例、不同值數量等),結合啟發式規則估算滿足條件的行數比例
- vardata:變量統計信息(如列 a 的直方圖、MCV列表等)。
- oproid:操作符OID(如 = 或 <>)。
- other:非常量表達式(如另一列 b 或表達式 b + 1)。
- negate:不等。
/** var_eq_non_const --- eqsel for var = something-other-than-const case** This is exported so that some other estimation functions can use it.*/
double
var_eq_non_const(VariableStatData *vardata, Oid oproid, Oid collation,Node *other,bool varonleft, bool negate)
{double selec;double nullfrac = 0.0;bool isdefault;
先拿到pg_statistic表對應的行。
if (HeapTupleIsValid(vardata->statsTuple)){Form_pg_statistic stats;stats = (Form_pg_statistic) GETSTRUCT(vardata->statsTuple);nullfrac = stats->stanullfrac;}...else if (HeapTupleIsValid(vardata->statsTuple)){double ndistinct;AttStatsSlot sslot;
基于統計信息估算:
- 計算非NULL值比例 1 - nullfrac。
- 通過 get_variable_numdistinct 獲取變量不同值數量 ndistinct。
- 假設每個不同值均勻分布,選擇率為 (1 - nullfrac) / ndistinct。
- 與MCV(最常見值)的最大頻率比較,避免高估。
selec = 1.0 - nullfrac;ndistinct = get_variable_numdistinct(vardata, &isdefault);
get_variable_numdistinct計算結果:5845167
stadistinct = stats->stadistinct = -0.58516644
(來自pg_statistic表)stanullfrac = 0
ntuples = vardata->rel->tuples = 10000001
(來自relation結構,來自pg_class表)- 計算結果:
clamp_row_est(-stadistinct * ntuples = 5845167.024516644) = 5845167
get_variable_numdistinct計算方法總結:pg_statistic表的stadistinct中取得選擇率 乘以 評估行數,等于評估出來多少個唯一值。stadistinct含義:stadistinct > 0表示該列中非空唯一值的實際數量。stadistinct < 0表示非空唯一值的數量占總行數的比例。stadistinct = 0表示無法確定該列的非空唯一值數量。
if (ndistinct > 1)selec /= ndistinct;......return selec;
計算得到選擇率
selec = selec / ndistinct = 1 / 5845167 = 1.7108151058814915e-07
回到最初的行數評估函數get_parameterized_baserel_size中:
- nrows = rel->tuples * clauselist_selectivity = 2
- clauselist_selectivity = 1.7108151058814915e-07
- rel->tuples = 10000001
總結:
- pg_statistic記錄的poid列唯一值個數占比
-0.58516644
- pg_class記錄行數為
10000001
條,所以評估poid列的唯一值個數為10000001*0.58516644
=5845167
(實際9901001
)。這里有一半的偏差,但不是主要原因。 2行
=10000001*1.71e-07
,在一個均勻分布的數據集中,每一個唯一值出現的概率是1.71e-07。所以如果有1千萬行數據,隨便給一個poid,能選出來兩行不一樣的。- 如果唯一值非常少,那么選擇率會變大,趨近于1,則
10000001行
=10000001*1
。隨便給一個poid,能選出來會非常多。 - 如果唯一值非常多,那么選擇率會變小,趨近于0,則
0行
=10000001*0
。隨便給一個poid,能選出來會非常少。
- 如果唯一值非常少,那么選擇率會變大,趨近于1,則
- 評估兩行但實際是100行的原因是,pg_statistic中stadistinct記錄的連接條件poid列的唯一值。
create_index_path→cost_index計算參數化路徑評估行數(從上一步的ParamPathInfo結果中取得)
- cost_index中如果發現參數化路徑,會從ParamPathInfo中取評估行數
- 本例中path->path.param_info->ppi_rows=2。
- ??param_info存在表示當前索引掃描路徑是參數化的,依賴外部循環(如嵌套循環連接的外層表)提供的參數值。
- ??行數估算??:ppi_rows 表示參數化路徑的預估行數,這是優化器根據外層表(如嵌套循環中的驅動表)的約束條件和連接關系動態調整的結果,比如ppi_rows=2表示,優化器評估外表每一條,內表有兩條能連得上。
void
cost_index(IndexPath *path, PlannerInfo *root, double loop_count,bool partial_path)......if (path->path.param_info){path->path.rows = path->path.param_info->ppi_rows;/* qpquals come from the rel's restriction clauses and ppi_clauses */qpquals = list_concat(extract_nonindex_conditions(path->indexinfo->indrestrictinfo,path->indexclauses),extract_nonindex_conditions(path->path.param_info->ppi_clauses,path->indexclauses));}else{path->path.rows = baserel->rows;/* qpquals come from just the rel's restriction clauses */qpquals = extract_nonindex_conditions(path->indexinfo->indrestrictinfo,path->indexclauses);}...