PostgreSQL 統計信息pg_statistic格式及導入導出dump_stat

標簽

PostgreSQL , dump_stat , 統計信息 , 導出導入

背景

《PostgreSQL 規格評估 - 微觀、宏觀、精準多視角估算數據庫性能(選型、做預算不求人)》

EXPLAIN是PG數據庫用于輸出SQL執行計劃的語法，

1、生成的執行計劃中包含COST一項。

如果校準了成本因子，COST可以和SQL實際執行時間對其。因子校對的方法如下，實際上每一種硬件，我們只需要校對一遍即可。

《優化器成本因子校對(disk,ssd,memory IO開銷精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《優化器成本因子校對 - PostgreSQL explain cost constants alignment to timestamp》

校對因子如下：

#seq_page_cost = 1.0                    # measured on an arbitrary scale    
random_page_cost = 1.2                  # same scale as above    
#cpu_tuple_cost = 0.01                  # same scale as above    
#cpu_index_tuple_cost = 0.005           # same scale as above    
#cpu_operator_cost = 0.0025             # same scale as above    # 以下不需要校對, 不考慮并行計算SQL    
parallel_tuple_cost = 0.1               # same scale as above    
parallel_setup_cost = 1000.0            # same scale as above    
effective_cache_size = 10GB

2、評估COST還需要依賴統計信息柱狀圖：

涉及reltuples, relpages. 表示評估的記錄數以及占用多少個數據塊。注意源頭的block_size可能和PG的不一致，占用多少個塊需要轉換一下。(show block_size可以查看數據塊大小。)   postgres=# \d pg_class    Table "pg_catalog.pg_class"    Column        |     Type     | Collation | Nullable | Default     
---------------------+--------------+-----------+----------+---------    relname             | name         |           | not null |   -- 對象名  relnamespace        | oid          |           | not null |   -- 對象所屬的schema, 對應pg_namespace.oid  reltype             | oid          |           | not null |     reloftype           | oid          |           | not null |     relowner            | oid          |           | not null |     relam               | oid          |           | not null |     relfilenode         | oid          |           | not null |     reltablespace       | oid          |           | not null |     relpages            | integer      |           | not null |   -- 評估的頁數(單位為block_size)  reltuples           | real         |           | not null |   -- 評估的記錄數  relallvisible       | integer      |           | not null |     reltoastrelid       | oid          |           | not null |     relhasindex         | boolean      |           | not null |     relisshared         | boolean      |           | not null |     relpersistence      | "char"       |           | not null |     relkind             | "char"       |           | not null |     relnatts            | smallint     |           | not null |     relchecks           | smallint     |           | not null |     relhasoids          | boolean      |           | not null |     relhaspkey          | boolean      |           | not null |     relhasrules         | boolean      |           | not null |     relhastriggers      | boolean      |           | not null |     relhassubclass      | boolean      |           | not null |     relrowsecurity      | boolean      |           | not null |     relforcerowsecurity | boolean      |           | not null |     relispopulated      | boolean      |           | not null |     relreplident        | "char"       |           | not null |     relispartition      | boolean      |           | not null |     relfrozenxid        | xid          |           | not null |     relminmxid          | xid          |           | not null |     relacl              | aclitem[]    |           |          |     reloptions          | text[]       |           |          |     relpartbound        | pg_node_tree |           |          |     
Indexes:    "pg_class_oid_index" UNIQUE, btree (oid)    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)    涉及   空值比例、平均列寬、唯一值比例或個數、高頻值以及頻率、柱狀圖分布、存儲相關性、多值列（高頻元素及比例、元素柱狀圖分布）。    -- 這個是視圖：  postgres=# \d pg_stats     View "pg_catalog.pg_stats"    Column         |   Type   | Collation | Nullable | Default     
------------------------+----------+-----------+----------+---------    schemaname             | name     |           |          |   -- 對象所屬的schema  tablename              | name     |           |          |   -- 對象名  attname                | name     |           |          |   -- 列名  inherited              | boolean  |           |          |   -- 是否為繼承表的統計信息（false時表示當前表的統計信息，true時表示包含所有繼承表的統計信息）  null_frac              | real     |           |          |   -- 該列空值比例  avg_width              | integer  |           |          |   -- 該列平均長度  n_distinct             | real     |           |          |   -- 該列唯一值個數（-1表示唯一，小于1表示占比，大于等于1表示實際的唯一值個數）  most_common_vals       | anyarray |           |          |   -- 該列高頻詞  most_common_freqs      | real[]   |           |          |   -- 該列高頻詞對應的出現頻率  histogram_bounds       | anyarray |           |          |   -- 該列柱狀圖（表示隔出的每個BUCKET的記錄數均等）  correlation            | real     |           |          |   -- 該列存儲相關性（-1到1的區間），絕對值越小，存儲越離散。小于0表示反向相關，大于0表示正向相關  most_common_elems      | anyarray |           |          |   -- 該列為多值類型（數組）時，多值元素的高頻詞  most_common_elem_freqs | real[]   |           |          |   -- 多值元素高頻詞的出現頻率  elem_count_histogram   | real[]   |           |          |   -- 多值元素的柱狀圖中，每個區間的非空唯一元素個數  -- 這個是實際存儲的數據(也就是要導入的部分)：  
-- https://www.postgresql.org/docs/10/static/catalog-pg-statistic.html  postgres=# \d pg_statistic    Table "pg_catalog.pg_statistic"    Column    |   Type   | Collation | Nullable | Default     
-------------+----------+-----------+----------+---------    starelid    | oid      |           | not null |   -- 對象OID，對應pg_class.oid  staattnum   | smallint |           | not null |   -- 該列在表中的位置序號，對應pg_attribute.attnum  stainherit  | boolean  |           | not null |   -- 是否為繼承表的統計信息（false時表示當前表的統計信息，true時表示包含所有繼承表的統計信息）  stanullfrac | real     |           | not null |   -- 空值比例  stawidth    | integer  |           | not null |   -- 平均長度  stadistinct | real     |           | not null |   -- 唯一值個數、比例  stakind1    | smallint |           | not null |   -- 表示第1個SLOT的統計信息分類編號  stakind2    | smallint |           | not null |   -- 表示第2個SLOT的統計信息分類編號  stakind3    | smallint |           | not null |   -- 表示第3個SLOT的統計信息分類編號  stakind4    | smallint |           | not null |   -- 表示第4個SLOT的統計信息分類編號  stakind5    | smallint |           | not null |   -- 表示第5個SLOT的統計信息分類編號  staop1      | oid      |           | not null |   -- 表示第1個SLOT的統計信息是用哪個operator生成的(例如統計柱狀圖邊界，需要用到 "<" 這個操作符)  staop2      | oid      |           | not null |   -- 表示第2個SLOT的統計信息是用哪個operator生成的(例如統計柱狀圖邊界，需要用到 "<" 這個操作符)  staop3      | oid      |           | not null |   -- 表示第3個SLOT的統計信息是用哪個operator生成的(例如統計柱狀圖邊界，需要用到 "<" 這個操作符)  staop4      | oid      |           | not null |   -- 表示第4個SLOT的統計信息是用哪個operator生成的(例如統計柱狀圖邊界，需要用到 "<" 這個操作符)  staop5      | oid      |           | not null |   -- 表示第5個SLOT的統計信息是用哪個operator生成的(例如統計柱狀圖邊界，需要用到 "<" 這個操作符)  stanumbers1 | real[]   |           |          |   -- 表示第1個SLOT的以numeric[]為結果的統計信息，NULL說明這個SLOT分類沒有numeric的統計信息。  stanumbers2 | real[]   |           |          |   -- 表示第2個SLOT的以numeric[]為結果的統計信息，NULL說明這個SLOT分類沒有numeric的統計信息。  stanumbers3 | real[]   |           |          |   -- 表示第3個SLOT的以numeric[]為結果的統計信息，NULL說明這個SLOT分類沒有numeric的統計信息。  stanumbers4 | real[]   |           |          |   -- 表示第4個SLOT的以numeric[]為結果的統計信息，NULL說明這個SLOT分類沒有numeric的統計信息。  stanumbers5 | real[]   |           |          |   -- 表示第5個SLOT的以numeric[]為結果的統計信息，NULL說明這個SLOT分類沒有numeric的統計信息。  stavalues1  | anyarray |           |          |   -- 表示第1個SLOT的以anyarray[]為結果的統計信息，NULL說明這個SLOT分類沒有anyarray[]統計信息。數組類型為列的元素類型，或者列本身的類型。  stavalues2  | anyarray |           |          |   -- 表示第2個SLOT的以anyarray[]為結果的統計信息，NULL說明這個SLOT分類沒有anyarray[]統計信息。數組類型為列的元素類型，或者列本身的類型。  stavalues3  | anyarray |           |          |   -- 表示第3個SLOT的以anyarray[]為結果的統計信息，NULL說明這個SLOT分類沒有anyarray[]統計信息。數組類型為列的元素類型，或者列本身的類型。  stavalues4  | anyarray |           |          |   -- 表示第4個SLOT的以anyarray[]為結果的統計信息，NULL說明這個SLOT分類沒有anyarray[]統計信息。數組類型為列的元素類型，或者列本身的類型。  stavalues5  | anyarray |           |          |   -- 表示第5個SLOT的以anyarray[]為結果的統計信息，NULL說明這個SLOT分類沒有anyarray[]統計信息。數組類型為列的元素類型，或者列本身的類型。  
Indexes:    "pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)

statkind的定義

src/include/catalog/pg_statistic.h

/*  * Currently, five statistical slot "kinds" are defined by core PostgreSQL,  * as documented below.  Additional "kinds" will probably appear in  * future to help cope with non-scalar datatypes.  Also, custom data types  * can define their own "kind" codes by mutual agreement between a custom  * typanalyze routine and the selectivity estimation functions of the type's  * operators.  *  * Code reading the pg_statistic relation should not assume that a particular  * data "kind" will appear in any particular slot.  Instead, search the  * stakind fields to see if the desired data is available.  (The standard  * function get_attstatsslot() may be used for this.)  */  /*  * The present allocation of "kind" codes is:  *  *      1-99:           reserved for assignment by the core PostgreSQL project  *                              (values in this range will be documented in this file)  *      100-199:        reserved for assignment by the PostGIS project  *                              (values to be documented in PostGIS documentation)  *      200-299:        reserved for assignment by the ESRI ST_Geometry project  *                              (values to be documented in ESRI ST_Geometry documentation)  *      300-9999:       reserved for future public assignments  *  * For private use you may choose a "kind" code at random in the range  * 10000-30000.  However, for code that is to be widely disseminated it is  * better to obtain a publicly defined "kind" code by request from the  * PostgreSQL Global Development Group.  */  /*  * In a "most common values" slot, staop is the OID of the "=" operator  * used to decide whether values are the same or not.  stavalues contains  * the K most common non-null values appearing in the column, and stanumbers  * contains their frequencies (fractions of total row count).  The values  * shall be ordered in decreasing frequency.  Note that since the arrays are  * variable-size, K may be chosen by the statistics collector.  Values should  * not appear in MCV unless they have been observed to occur more than once;  * a unique column will have no MCV slot.  */  
#define STATISTIC_KIND_MCV      1  /*  * A "histogram" slot describes the distribution of scalar data.  staop is  * the OID of the "<" operator that describes the sort ordering.  (In theory,  * more than one histogram could appear, if a datatype has more than one  * useful sort operator.)  stavalues contains M (>=2) non-null values that  * divide the non-null column data values into M-1 bins of approximately equal  * population.  The first stavalues item is the MIN and the last is the MAX.  * stanumbers is not used and should be NULL.  IMPORTANT POINT: if an MCV  * slot is also provided, then the histogram describes the data distribution  * *after removing the values listed in MCV* (thus, it's a "compressed  * histogram" in the technical parlance).  This allows a more accurate  * representation of the distribution of a column with some very-common  * values.  In a column with only a few distinct values, it's possible that  * the MCV list describes the entire data population; in this case the  * histogram reduces to empty and should be omitted.  */  
#define STATISTIC_KIND_HISTOGRAM  2  /*  * A "correlation" slot describes the correlation between the physical order  * of table tuples and the ordering of data values of this column, as seen  * by the "<" operator identified by staop.  (As with the histogram, more  * than one entry could theoretically appear.)  stavalues is not used and  * should be NULL.  stanumbers contains a single entry, the correlation  * coefficient between the sequence of data values and the sequence of  * their actual tuple positions.  The coefficient ranges from +1 to -1.  */  
#define STATISTIC_KIND_CORRELATION      3  /*  * A "most common elements" slot is similar to a "most common values" slot,  * except that it stores the most common non-null *elements* of the column  * values.  This is useful when the column datatype is an array or some other  * type with identifiable elements (for instance, tsvector).  staop contains  * the equality operator appropriate to the element type.  stavalues contains  * the most common element values, and stanumbers their frequencies.  Unlike  * MCV slots, frequencies are measured as the fraction of non-null rows the  * element value appears in, not the frequency of all rows.  Also unlike  * MCV slots, the values are sorted into the element type's default order  * (to support binary search for a particular value).  Since this puts the  * minimum and maximum frequencies at unpredictable spots in stanumbers,  * there are two extra members of stanumbers, holding copies of the minimum  * and maximum frequencies.  Optionally, there can be a third extra member,  * which holds the frequency of null elements (expressed in the same terms:  * the fraction of non-null rows that contain at least one null element).  If  * this member is omitted, the column is presumed to contain no null elements.  *  * Note: in current usage for tsvector columns, the stavalues elements are of  * type text, even though their representation within tsvector is not  * exactly text.  */  
#define STATISTIC_KIND_MCELEM  4  /*  * A "distinct elements count histogram" slot describes the distribution of  * the number of distinct element values present in each row of an array-type  * column.  Only non-null rows are considered, and only non-null elements.  * staop contains the equality operator appropriate to the element type.  * stavalues is not used and should be NULL.  The last member of stanumbers is  * the average count of distinct element values over all non-null rows.  The  * preceding M (>=2) members form a histogram that divides the population of  * distinct-elements counts into M-1 bins of approximately equal population.  * The first of these is the minimum observed count, and the last the maximum.  */  
#define STATISTIC_KIND_DECHIST  5  /*  * A "length histogram" slot describes the distribution of range lengths in  * rows of a range-type column. stanumbers contains a single entry, the  * fraction of empty ranges. stavalues is a histogram of non-empty lengths, in  * a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range  * values that divide the column data values into M-1 bins of approximately  * equal population. The lengths are stored as float8s, as measured by the  * range type's subdiff function. Only non-null rows are considered.  */  
#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM  6  /*  * A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for  * a range-type column.  stavalues contains M (>=2) range values that divide  * the column data values into M-1 bins of approximately equal population.  * Unlike a regular scalar histogram, this is actually two histograms combined  * into a single array, with the lower bounds of each value forming a  * histogram of lower bounds, and the upper bounds a histogram of upper  * bounds.  Only non-NULL, non-empty ranges are included.  */  
#define STATISTIC_KIND_BOUNDS_HISTOGRAM  7

那么這些統計信息如何導入導出呢？

導出導入統計信息

https://postgrespro.com/docs/postgresproee/9.6/dump-stat.html

dump_stat這個插件是PostgreSQL pro推出的兼容9.6版本的導出統計信息的插件。

代碼如下：

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

導出

通過dump_stat導出（導出的結構已經是SQL形式），然后通過SQL導入。

$ psql test -A    
test=# \t    
test=# \o dump_stat.sql    
test=# select dump_statistic();

pg_statistic的每一條記錄，產生一條如下SQL：

WITH upsert as (    UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]    WHERE to_schema_qualified_relation(starelid) = t_relname    AND to_attname(t_relname, staattnum) = t_attname    AND to_atttype(t_relname, staattnum) = t_atttype    AND stainherit = t_stainherit    RETURNING *)    
ins as (    SELECT expression [, ...]    WHERE NOT EXISTS (SELECT * FROM upsert)    AND to_attnum(t_relname, t_attname) IS NOT NULL    AND to_atttype(t_relname, t_attname) = t_atttype)    
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;    where expression can be one of:    array_in(array_text, type_name::regtype::oid, -1)    
value::type_name

stat導入的實際例子

(表示public.test表的info列的統計信息，如果存在則更新，不存在則插入。)

WITH   upsert as (   UPDATE pg_catalog.pg_statistic   SET   stanullfrac = 0, stawidth = 4, stadistinct = -1, stakind1 = 2, stakind2 = 3, stakind3 = 0, stakind4 = 0, stakind5 = 0, staop1 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop2 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop3 = '0'::regoperator, staop4 = '0'::regoperator, staop5 = '0'::regoperator, stanumbers1 = NULL::real[], stanumbers2 = '{-0.5}'::real[], stanumbers3 = NULL::real[], stanumbers4 = NULL::real[], stanumbers5 = NULL::real[], stavalues1 = array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray, stavalues2 = NULL::anyarray, stavalues3 = NULL::anyarray, stavalues4 = NULL::anyarray, stavalues5 = NULL::anyarray   WHERE to_schema_qualified_relation(starelid) = 'public.test' AND to_attname('public.test', staattnum) = 'info' AND to_atttype('public.test', staattnum) = 'pg_catalog.text' AND stainherit = false   RETURNING *  ),   ins as (   SELECT   'public.test'::regclass,   to_attnum('public.test', 'info'),   'false'::boolean,   0::real,   4::integer,   -1::real,   2,  -- stakind=2 表示柱狀圖  3,  -- stakind=3 表示相關性  0,   0,   0,   'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator,   'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator,   '0'::regoperator,   '0'::regoperator,   '0'::regoperator,   NULL::real[],   '{-0.5}'::real[],   NULL::real[],   NULL::real[],   NULL::real[],   array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray,   NULL::anyarray,   NULL::anyarray,   NULL::anyarray,   NULL::anyarray   WHERE NOT EXISTS (SELECT * FROM upsert) AND to_attnum('public.test', 'info') IS NOT NULL AND to_atttype('public.test', 'info') = 'pg_catalog.text'  )   
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;

導入

1、修改postgresql.conf，允許修改系統表，重啟數據庫生效配置

vi postgresql.conf  allow_system_table_mods=on  pg_ctl restart -m fast

2、導入統計信息

-- 1 pg_class    update pg_class set reltuples=?, relpages=? where oid=?;    -- 2 pg_statistic    WITH upsert as (    UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]    WHERE to_schema_qualified_relation(starelid) = t_relname    AND to_attname(t_relname, staattnum) = t_attname    AND to_atttype(t_relname, staattnum) = t_atttype    AND stainherit = t_stainherit    RETURNING *)    
ins as (    SELECT expression [, ...]    WHERE NOT EXISTS (SELECT * FROM upsert)    AND to_attnum(t_relname, t_attname) IS NOT NULL    AND to_atttype(t_relname, t_attname) = t_atttype)    
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;    where expression can be one of:    array_in(array_text, type_name::regtype::oid, -1)    
value::type_name

3、導入完成后，將allow_system_table_mods設置為off，重啟數據庫。

dump_statistic代碼

CREATE FUNCTION dump_statistic(relid oid) RETURNS SETOF TEXT AS $$  DECLARE  result  text;  cmd             text;           -- main query  in_args text;           -- args for insert  up_args text;           -- args for upsert  fstaop  text := '%s::regoperator';  arr_in  text := 'array_in(%s, %s::regtype, -1)::anyarray';  stacols text[] = ARRAY['stakind', 'staop',  'stanumbers', 'stavalues' ];  r               record;  i               int;  j               text;  ncols   int := 26;      -- number of columns in pg_statistic  stanum  text[];         -- stanumbers{1, 5}  staval  text[];         -- stavalues{1, 5}  staop   text[];         -- staop{1, 5}  relname text;           -- quoted relation name  attname text;           -- quoted attribute name  atttype text;           -- quoted attribute type  BEGIN  for r in  select * from pg_catalog.pg_statistic  where starelid = relid  and get_namespace(starelid) != to_namespace('information_schema')  and get_namespace(starelid) != to_namespace('pg_catalog') loop  relname := to_schema_qualified_relation(r.starelid);  attname := quote_literal(to_attname(relname, r.staattnum));  atttype := quote_literal(to_atttype(relname, r.staattnum));  relname := quote_literal(relname); -- redefine relname  in_args := '';  up_args = 'stanullfrac = %s, stawidth = %s, stadistinct = %s, ';  cmd := 'WITH upsert as ( ' ||  'UPDATE pg_catalog.pg_statistic SET %s ' ||  'WHERE to_schema_qualified_relation(starelid) = ' || relname || ' '  'AND to_attname(' || relname || ', staattnum) = ' || attname || ' '  'AND to_atttype(' || relname || ', staattnum) = ' || atttype || ' '  'AND stainherit = ' || r.stainherit || ' ' ||  'RETURNING *), ' ||  'ins as ( ' ||  'SELECT %s ' ||  'WHERE NOT EXISTS (SELECT * FROM upsert) ' ||  'AND to_attnum(' || relname || ', ' || attname || ') IS NOT NULL '  'AND to_atttype(' || relname || ', ' || attname || ') = ' || atttype || ') '  'INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;';  for i in 1..ncols loop  in_args := in_args || '%s';  if i != ncols then  in_args := in_args || ', ';  end if;  end loop;  for j in 1..4 loop  for i in 1..5 loop  up_args := up_args || format('%s%s = %%s', stacols[j], i);  if i * j != 20 then  up_args := up_args || ', ';  end if;  end loop;  end loop;  cmd := format(cmd, up_args, in_args);   --prepare template for main query  staop := array[format(fstaop, quote_literal(to_schema_qualified_operator(r.staop1))),  format(fstaop, quote_literal(to_schema_qualified_operator(r.staop2))),  format(fstaop, quote_literal(to_schema_qualified_operator(r.staop3))),  format(fstaop, quote_literal(to_schema_qualified_operator(r.staop4))),  format(fstaop, quote_literal(to_schema_qualified_operator(r.staop5)))];  stanum := array[r.stanumbers1::text,  r.stanumbers2::text,  r.stanumbers3::text,  r.stanumbers4::text,  r.stanumbers5::text];  for i in 1..5 loop  if stanum[i] is null then  stanum[i] := 'NULL::real[]';  else  stanum[i] := '''' || stanum[i] || '''::real[]';  end if;  end loop;  if r.stavalues1 is not null then  staval[1] := format(arr_in, quote_literal(r.stavalues1),  quote_literal(  to_schema_qualified_type(  anyarray_elemtype(r.stavalues1))));  else  staval[1] := 'NULL::anyarray';  end if;  if r.stavalues2 is not null then  staval[2] := format(arr_in, quote_literal(r.stavalues2),  quote_literal(  to_schema_qualified_type(  anyarray_elemtype(r.stavalues2))));  else  staval[2] := 'NULL::anyarray';  end if;  if r.stavalues3 is not null then  staval[3] := format(arr_in, quote_literal(r.stavalues3),  quote_literal(  to_schema_qualified_type(  anyarray_elemtype(r.stavalues3))));  else  staval[3] := 'NULL::anyarray';  end if;  if r.stavalues4 is not null then  staval[4] := format(arr_in, quote_literal(r.stavalues4),  quote_literal(  to_schema_qualified_type(  anyarray_elemtype(r.stavalues4))));  else  staval[4] := 'NULL::anyarray';  end if;  if r.stavalues5 is not null then  staval[5] := format(arr_in, quote_literal(r.stavalues5),  quote_literal(  to_schema_qualified_type(  anyarray_elemtype(r.stavalues5))));  else  staval[5] := 'NULL::anyarray';  end if;  --DEBUG  --staop := array['{arr}', '{arr}', '{arr}', '{arr}', '{arr}'];  --stanum := array['{num}', '{num}', '{num}', '{num}', '{num}'];  --staval := array['{val}', '{val}', '{val}', '{val}', '{val}'];  result := format(cmd,  r.stanullfrac,  r.stawidth,  r.stadistinct,  -- stakind  r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,  -- staop  staop[1], staop[2], staop[3], staop[4], staop[5],  -- stanumbers  stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],  -- stavalues  staval[1], staval[2], staval[3], staval[4], staval[5],  -- first 6 columns  format('%s::regclass', relname),  format('to_attnum(%s, %s)', relname, attname),  '''' || r.stainherit || '''::boolean',  r.stanullfrac || '::real',  r.stawidth || '::integer',  r.stadistinct || '::real',  -- stakind  r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,  -- staop  staop[1], staop[2], staop[3], staop[4], staop[5],  -- stanumbers  stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],  -- stavalues  staval[1], staval[2], staval[3], staval[4], staval[5]);  return next result;  end loop;  return;  END;  
$$ LANGUAGE plpgsql;

我們甚至可以將Oracle數據庫的統計信息，平移到PG數據庫，對齊需要的元素即可：

記錄數、占用多少個數據塊。每列的空值比例、平均列寬、唯一值比例或個數、高頻值以及頻率、柱狀圖分布、存儲相關性、多值列（高頻元素及比例、元素柱狀圖分布）。

好處：在遷移ORACLE數據時，可以關閉autovacuumm（提高導入速度），通過這種方法來導入統計信息。（只要元素對應即可，當然有些元素可能是ORACLE中不采集的，比如多值列的統計信息）。