列順序占用存儲大小的影響 in Oracle、MySQL、PostGreSQL
在創建表時,如果相同的列類型,不同表列的順序是否會影響數據庫占用空間大小?使用oracle、mysql或postgresql是不是相同的表現呢? 不是的Postgresql近期發現空間使用會因為columns的順序而占用不同的大小,當然也和實際的數據有關,簡單的測試。
Oracle
SQL> CREATE TABLE t_test ( i1 int,
i2 int,
i3 int,
v1 varchar(100),
v2 varchar(100),
v3 varchar(100)7 );Table created.INSERT INTO t_test SELECT 10, 20, 30,
'abcd', 'abcd', 'abcd'3 FROM dual connect by rownum<=10000; 10000 rows created.SQL> select BYTES from dba_segments where segment_name='T_TEST';BYTES
----------393216SQL> SELECT dbms_xplan.FORMAT_SIZE(BYTES) SEG_SIZE from dba_segments where segment_name='T_TEST';SEG_SIZE
------------------------------
384KSQL> DROP TABLE T_TEST;
Table dropped.SQL> CREATE TABLE t_test ( v1 varchar(100),
i1 int,
v2 varchar(100),
i2 int,
v3 varchar(100),
i3 int7 );Table created.SQL> INSERT INTO t_test SELECT 'abcd', 10, 'abcd',
20, 'abcd', 303 FROM dual connect by rownum<=10000; SQL> select BYTES from dba_segments where segment_name='T_TEST';BYTES
----------393216
Note:
在ORACLE數據庫中TABLE COLUMN順序打亂后, 表段大小一致。
MySQL
mysql> create table t_test( i1 int,i2 int, i3 int,v1 varchar(100),v2 varchar(100),v3 varchar(100));
Query OK, 0 rows affected (0.09 sec)mysql> DELIMITER $$
mysql> CREATE PROCEDURE LoadCal()-> BEGIN-> declare n int default 1;-> declare MAX int default 10001;-> while n < MAX do -> insert into t_test select 10,20,30,'abcd','abcd','abcd';-> set n = n + 1;-> end while;-> END$$
Query OK, 0 rows affected (0.02 sec)mysql> DELIMITER ;
mysql> select count(*) from t_test;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)mysql> call LoadCal();
Query OK, 1 row affected (1 min 17.73 sec)mysql> select count(*) from t_test;
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
1 row in set (0.05 sec)mysql> SELECT TABLE_NAME ,-> CONCAT(ROUND((DATA_LENGTH)/1024,2),'KB') AS'total'-> FROM information_schema.TABLES WHERE TABLE_NAME='t_test';
+------------+-----------+
| TABLE_NAME | total |
+------------+-----------+
| t_test | 1552.00KB |
+------------+-----------+
1 row in set (0.00 sec)mysql> drop table t_test;
Query OK, 0 rows affected (0.04 sec)mysql> create table t_test( v1 varchar(100),i1 int, v2 varchar(100),i2 int,v3 varchar(100),i3 int);
Query OK, 0 rows affected (0.05 sec)mysql> drop procedure LoadCal;
Query OK, 0 rows affected (0.03 sec)mysql> DELIMITER $$
mysql> CREATE PROCEDURE LoadCal()-> BEGIN-> declare n int default 1;-> declare MAX int default 10001;-> while n < MAX do -> insert into t_test select 'abcd',10,'abcd',20,'abcd',30 ;-> set n = n + 1;-> end while;-> END$$
Query OK, 0 rows affected (0.02 sec)mysql> DELIMITER ;
mysql> call LoadCal;
Query OK, 1 row affected (1 min 15.33 sec)mysql> SELECT TABLE_NAME ,-> CONCAT(ROUND((DATA_LENGTH)/1024,2),'KB') AS'total'-> FROM information_schema.TABLES WHERE TABLE_NAME='t_test';
+------------+-----------+
| TABLE_NAME | total |
+------------+-----------+
| t_test | 1552.00KB |
+------------+-----------+
1 row in set (0.00 sec)
Note:
MySQL數據庫,表列不同的順序,表所占用空間大小也是一致。
PostGreSQL
anbob=# CREATE TABLE t_test ( i1 int,
anbob(# i2 int,
anbob(# i3 int,
anbob(# v1 varchar(100),
anbob(# v2 varchar(100),
anbob(# v3 varchar(100)
anbob(# );
CREATE TABLE
anbob=# INSERT INTO t_test SELECT 10, 20, 30,
anbob-# 'abcd', 'abcd', 'abcd'
anbob-# FROM generate_series(1, 10000);
INSERT 0 10000
anbob=# select pg_relation_size('t_test');pg_relation_size
------------------606208
(1 row)anbob=# select pg_size_pretty( pg_relation_size('t_test'));pg_size_pretty
----------------592 kB
(1 row)anbob=# drop table t_test;
DROP TABLE
anbob=# CREATE TABLE t_test ( v1 varchar(100),
anbob(# i1 int,
anbob(# v2 varchar(100),
anbob(# i2 int,
anbob(# v3 varchar(100),
anbob(# i3 int
anbob(# );
CREATE TABLE
anbob=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd',
anbob-# 20, 'abcd', 30
anbob-# FROM generate_series(1, 10000);
INSERT 0 10000
anbob=# select pg_size_pretty( pg_relation_size('t_test'));pg_size_pretty
----------------672 kB
(1 row)
Note:
在PostgreSQL數據中,盡管表中的數據完全相同,但該表已顯著增長。
這個問題的原因稱為alignment(對齊)。
PostgreSQL tuple內部有ALIGN機制,因此字段順序也有講究,選擇不好,可能因為ALIGN導致空間放大, 理論如下:
如果一個字段沒有以CPU word-size的倍數開始,那么 CPU 就會遇到困難,在代碼src/backend/access/common/heaptuple.c。
因此,PostgreSQL 會相應地在物理上對齊數據。 這里最重要的一點是,將具有相似數據類型的列彼此相鄰分組是有意義的。
當然,結果和潛在的大小差異在很大程度上取決于內容。 如果在此示例中使用“abc”而不是“abcd”,則結果不會顯示任何差異;
weejar=# select pg_column_size(row(int4 '10',varchar 'abc',int4 '10',varchar 'abc'));pg_column_size
----------------40
-- 24+4+4+4+4weejar=# select pg_column_size(row(int4 '10',int4 '10',varchar 'abc',varchar 'abc'));pg_column_size
----------------40
typalign char
typalign is the alignment required when storing a value of this type. It applies to storage on disk as well as most representations of the value inside PostgreSQL. When multiple values are stored consecutively, such as in the representation of a complete row on disk, padding is inserted before a datum of this type so that it begins on the specified boundary. The alignment reference is the beginning of the first datum in the sequence. Possible values are:
- c = char alignment, i.e., no alignment needed.
- s = short alignment (2 bytes on most machines).
- i = int alignment (4 bytes on most machines).
- d = double alignment (8 bytes on many machines, but by no means all).
weejar=# select pg_column_size(row(varchar 'abc'));pg_column_size
----------------28
weejar=# select pg_column_size(row(int4 '10'));pg_column_size
----------------28
weejar=# select pg_column_size(row(varchar 'abcd'));pg_column_size
----------------29
weejar=# select pg_column_size(row(varchar 'abcd',int4 '10'));pg_column_size
----------------36
Note:
36= header + “abcd” 5 取4倍數為8+int4 4
= 24+4+[1+(3補齊)}+4
在C或Go Lang開發對象中同樣存在該設計,如GO
查看這個表的對齊規則
SELECT a.attname, t.typname, t.typalign, t.typlen
FROM pg_class c
JOIN pg_attribute a ON (a.attrelid = c.oid)
JOIN pg_type t ON (t.oid = a.atttypid)
WHERE c.relname = 'xxx'
AND a.attnum >= 0
ORDER BY a.attnum;
Summary
測試發現目前只有postgresql會因為補齊問題,在相同數據不同的列順序時會產生不同的空間大小,而oracle和MySQL不存在,所以在PostgreSQL中注意列順序,同時在opengauss系也和PG相同的表現。
by the way, 對于10000行相同的數據也可以看出三個數據庫的磁盤空間耗費排列 MySQL (1500K)> PostgreSQL (600K) > Oracle (300K)