PostgreSQL（二十六）分區表管理

一、分區表特點

1、概念：

2、好處：

3、特點：

二、范圍分區介紹

1、簡介

2、范圍分區實驗：

三、list分區介紹

1、簡介

2、list分區表實驗

四、hash分區介紹

1、簡介

2、hash分區表實驗

五、混合分區介紹

1、簡介

2、混合分區表實驗

六、總結

一、分區表特點

1、概念：

????????分區表的核心是分而治之。將表數據分成更小的物理分片，減少搜索范圍，以此可以查詢提高性能。

????????分區表是關系型數據庫中比較常見的對大表的優化方式，數據庫管理系統一般都提供了分區管理，而業務可以直接訪問分區表而不需要調整業務架構，當然好的性能需要合理的分區訪問方式。

2、好處：

（1）改善查詢性能

（2）增強可用性：單個小分區表損壞，不影響其他分區表的使用

（3）維護方便

（4）均衡I/O：PG的一個表只能放在一個表空間下，一個表空間只能在一個磁盤上。但是分區表是很多表，可以存儲到不同的磁盤上，以達到均衡I/O的目的

3、特點：

（1）pg數據庫表分區表的結構：

由主表（父表）與分區表（子表）組成。

主表是創建子表的模板，它是一個正常的普通表，正常情況下它并不儲存任何數據；

分區表繼承并屬于一個主表，分區表中存儲所有的數據；

主表與分區表屬于一對多的關系，也就是說，一個主表包含多個分區表，而一個分區表只從屬于一個主表。

（2）官方聲明的分區實現方式：

a.繼承分區：PG10版本以前都是用的繼承分區（后續版本也可使用），但是管理相對麻煩。創建主表、分區表之后，還要創建存儲過程、觸發器等。繼承分區的數據存儲是依靠觸發器，來把數據分發到不同的分區表內。

b.聲明式分區：也叫原生分區，從PG10版本開始支持，相當于"官方支持"的分區表，也是最為推薦的分區方式。雖然與繼承分區不一樣，但是其內部也是用繼承表實現的，只是不需要用戶手動干預，使用起來更方便。

聲明式分區支持：范圍分區，list分區，hash分區。

本文章主要介紹這個分區方式。

c.第三方分區管理方式，比如pathman擴展等。

pathman可以讓分區表的管理更絲滑。例如：新插入的數據，其值不在分區表范圍內時，會插入失敗。但是使用pathman，會自動創建一個default分區，將不滿足分區范圍的數據插入到default分區中，避免插入失敗。

二、范圍分區介紹

1、簡介

范圍分區表一般指的一個分區的范圍，然后把滿足條件的行存放在該分區中。

最常見的是以日期做為分區條件，根據時間段分為不同的分區，存放不同時間段的數據。

2、范圍分區實驗：

# 創建主表
create table part_range(
order_id int,
name varchar(50)null,
saledate timestamp not null default now())
partition by range(saledate) ;# 在主表上創建一個主鍵約束后，子表上就會自動創建一樣的主鍵和索引
alter table part_range add primary key(order_id,saledate);\d+ part_range# 創建分區表（子表）
CREATE table p1_202401    //子表名字
PARTITION OF part_range     //基于（繼承）哪個主表
FOR values FROM ('2024-01-01 00:00:00') TO ('2024-02-01 00:00:00');    //分區范圍create table p1_202402 partition of part_range for values from ('2024-02-01 00:00:00') to ('2024-01 00:00:00:00');
create table p1_202403 partition of part_range for values from ('2024-03-01 00:00:00') to ('2024-04-01 00:00:00');
create table p1_202404 partition of part_range for values from ('2024-04-01 00:00:00') to ('2024-05-01 00:00:00');
create table p1_202405 partition of part_range for values from ('2024-05-01 00:00:00') to ('2024-06-01 00:00:00');
create table p1_202406 partition of part_range for values from ('2024-06-01 00:00:00') to ('2024-07-01 00:00:00');# 創建默認分區表，這樣有不屬于上面分區表的數據插入之后，就會進到默認表中
CREATE TABLE pn_default PARTITION OF part_range DEFAULT;     # 插入數據
insert into part_range select random()*10000,md5(g::text),g
from generate_series('2024-01-01'::date,'2024-06-30'::date,'1 minute') as g;# 查看數據
select tableoid::regclass,count(*)
from part_range 
group by tableoid::regclass;select tableoid::regclass,* from part_range limit 10;

Tip：在某些情況下，需要知道特定行來自哪個表，每個表中都有一個名為tableoid的系統列，使用regclass別名類型，它將象征性地打印表oid，可以列出行的原始表。

# 通過主表訪問分區表（強烈建議條件列是分區列，否則分區表無意義，數據庫會把所有分區表都遍歷一遍，大大降低性能）explain select * from part_range where saledate='2024-05-02';# 通過子表訪問（通過子表訪問時，如果訪問列不是分區列，但是索引列，也可以使用索引來提高一部分性能）
explain select * from p1_202401 where order_id=100;
explain select from p1_202401 where order_id=100;

三、list分區介紹

1、簡介

????????list分區以指定的分區值將數據存放到對應的分區上，然后把滿足條件的行存放在該分區中，最常見的是以某列值為分區條件，根據不同的列值存放在不同的分區。

2、list分區表實驗

# 創建主表
create table part_list(
city_id int not null,
name varchar(30),
population int)
partition by list(name) ;create index part_list_idx on part_list (name);    //list:分區方式；name:分區列\d+ part_list# 創建分區表
create table p1_list partition of part_list for values in ('fujian','zhejiang');
create table p2_list partition of part_list for values in ('shandong','jiangxi');# 插入數據
insert into part_list(city_id,name,population) values(1,'fujian',10);
insert into part_list(city_id,name,population) values(2,'zhejiang',20);
insert into part_list(city_id,name,population) values(3,'shandong',10);
insert into part_list(city_id,name,population) values(4,'jiangxi',30);# 查看數據
select tableoid::regclass,* from part_list;# 查看執行計劃
explain select * from part_list where name='fujian';

四、hash分區介紹

1、簡介

????????hash分區將數據散列存儲在各個分區上，以打散熱點數據存放到對應的分區上，然后把滿足條件的行存放在該分區中，最常見的是平均的把數據放在不同的分區。

2、hash分區表實驗

# 創建主表
create table part_hash
(order_id int,
name varchar(10))
partition by hash (order_id);create index part_hash_idx on part_hash (order_id);\d+ part_hash# 創建子表
create table p1_hash partition of part_hash for values with (modulus 3,remainder 0 );   //modules:除數；remainder:取模（余數）create table p2_hash partition of part_hash for values with (modulus 3,remainder 1);
create table p3_hash partition of part_hash for values with (modulus 3,remainder 2);# 插入數據
insert into part_hash values(generate_series(1,10000),'a');# 查詢數據
select tableoid::regclass,count(1) from part_hash group by tableoid::regclass;# 查看執行計劃
explain select * from part_hash where order_id=1000;

五、混合分區介紹

1、簡介

????????PG分區下面也可以建立子分區構成聯級模式，子分區可以有不同的分區方式，這樣的分區成為混合分區。

????????即：主表 --> 子表（分區表）--> 子分區? 的聯級關系。

????????當分區表的數據傾斜比較大時，就可以通過給大數據分區表創建子分區的方式平衡傾斜。

????????子分區的分區列可以和分區表的分區列一樣，也可以不一樣，在主表中指定子表的分區列，在子表中指定子分區的分區列。

????????例如下面這個例子，分區表以sale_date分區，子分區以category：

2、混合分區表實驗

1、創建主表
create table part_hunhe(
id int not null,
name varchar(20),
saledate timestamp)
partition by range(saledate) ;    //在主表指定，分區表使用哪一列進行分區。同樣的，應該在分區表指定，子分區表使用哪一列進行分區。\d+ part_hunhe# 創建分區表
create table part_2001 partition of part_hunhe for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00')
partition by list(name);     //在分區表創建時，指定子分區使用什么列進行分區create table part_2002 partition of part_hunhe for values from('2023-02-01 00:00:00') to ('2023-03-01 00:00:00')
partition by list(name);create table part_2003 partition of part_hunhe for values from('2023-03-01 00:00:00') to ('2023-04-01 00:00:00')
partition by list(name);create table part_2004 partition of part_hunhe for values from('2023-04-01 00:00:00') to ('2023-05-01 00:00:00')
partition by list(saledate);create table part_2005 partition of part_hunhe for values from('2023-05-01 00:00:00') to ('2023-06-01 00:00:00')
partition by range(saledate);\d+ part_2001# 創建子分區表
create table part_3001 partition of part_2001 for values in ('abc');
create table part_3002 partition of part_2001 for values in ('def');
create table part_3003 partition of part_2001 for values in ('jkl');
create table part_3004 partition of part_2004 for values in ('2023-04-01 00:00:00');
create table part_3005 partition of part_2005 for values from ('2023-05-01 00:00:00') to ('2023-05-10 00:00:00');\d+ part_3001# 插入數據
insert into part_hunhe values(random()*10000,'abc','2023-01-01 08:00:00');
insert into part_hunhe values(random()*10000,'def','2023-01-01 08:00:00');# 查詢數據
select tableoid::regclass,* from part_hunhe;# 查看執行計劃
explain select * from part_hunhe where name='adc';