一、表連接介紹
表連接類型
表連接是關系型數據庫關鍵特性,在關系型數據庫中,表連接分為三類:循環嵌套連接(Nested Loops Join)、哈希連接(Hash Join)、合并排序連接(Merge Sort Join)。假設存在表A和表B,都存在ID列,通過id列連接。
Nested Loops Join:遍歷A中所有id,依次拿id和表B中的id對比。
Hash Join:對A中的id做HASH運算,放入多個HASH Bucket中,根據B表的id列的hash值和HASH Bucket匹配。
Merge Sort Join:對A表和B表的id列進行排序,按照排序結果進行連接。
表連接分析
表的訪問次數,由執行計劃的starts列來表示
- NL連接驅動表被訪問0次或1次,被驅動表被訪問0次或N次,N的值取決于驅動表的返回行數。
- HASH連接,驅動表和被驅動表都是被訪問0次或1次,大部分場景是驅動表和被驅動表都訪問1次。
- MS連接,驅動表和被驅動表都是被訪問0次或1次,大部分場景是驅動表和被驅動表都訪問1次。
驅動順序
- NL連接的性能與驅動順序有關,一般小表做驅動表,性能更好。
- HASH連接的性能和驅動順序有關,一般小表做驅動表。
- MS連接的性能和驅動順序無關。
是否排序
- NL連接不排序、不多余消耗內存
- HASH連接不排序,但是hash area多消耗內存
- MS連接排序,消耗sort area內存
使用限制
- NL連接支持各種寫法,無限制。
- HASH連接支持等值連接,不支持>、<、like、<>等。
- MS連接不支持>、<、like、<>等
適用場景
- NL連接一般適用于OLTP系統
- HASH連接、MS連接適用于OLAP系統
二、表連接測試
創建測試數據
--創建表T1和T2
CREATE TABLE t1 (
id NUMBER NOT NULL,
n NUMBER,
contents VARCHAR2(4000));CREATE TABLE t2 (
id NUMBER NOT NULL,
t1_id NUMBER NOT NULL,
n NUMBER,
contents VARCHAR2(4000));--插入測試數據
execute dbms_random.seed(0);
INSERT INTO t1
SELECT rownum, rownum, dbms_random.string('a', 50)
FROM dual
CONNECT BY level <= 100
ORDER BY dbms_random.random;
INSERT INTO t2 SELECT rownum, rownum, rownum, dbms_random.string('b', 50) FROM dual CONNECT BY level <= 100000
ORDER BY dbms_random.random;
COMMIT;--查看數據量
SQL> select count(*) from t1;COUNT(*)
----------100
SQL> select count(*) from t2;COUNT(*)
----------100000
NL連接優化實驗
-- 兩個表無索引執行計劃
alter session set statistics_level=all;
select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 1016 | | | |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.02 | 1016 | 1000K| 1000K| 407K (0)|
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1 | 1 |00:00:00.01 | 7 | | | |
| 3 | TABLE ACCESS FULL| T2 | 1 | 103K| 100K|00:00:00.01 | 1006 | | | |
------------------------------------------------------------------------------------------------------------------驅動表創建索引create index t1_n on t1(n);select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.05 | 1008 |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.05 | 1008 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 3 | INDEX RANGE SCAN | T1_N | 1 | 1 | 1 |00:00:00.01 | 1 |
| 4 | TABLE ACCESS FULL | T2 | 1 | 103K| 100K|00:00:00.02 | 1006 |
-------------------------------------------------------------------------------------------------------
--被驅動表創建索引
CREATE INDEX t2_t1_id ON t2(t1_id);
select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name |Starts| E-Rows|A-Rows| A-Time |Buffers|Reads |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1| | 1|00:00:00.01 | 7| 4|
| 1 | NESTED LOOPS | | 1| 1| 1|00:00:00.01 | 7| 4|
| 2 | NESTED LOOPS | | 1| 1| 1|00:00:00.01 | 6| 4|
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1| 1| 1|00:00:00.01 | 3| 0|
|* 4 | INDEX RANGE SCAN | T1_N | 1| 1| 1|00:00:00.01 | 2| 0|
|* 5 | INDEX RANGE SCAN | T2_T1_ID| 1| 1| 1|00:00:00.01 | 3| 4|
| 6 | TABLE ACCESS BY INDEX ROWID | T2 | 1| 1| 1|00:00:00.01 | 1| 0|
-----------------------------------------------------------------------------------------------------------
--大表驅動小表
SELECT /*+ leading(t2) use_nl(t1) */ *2 FROM t1, t23 WHERE t1.id = t2.t1_id4 AND t1.n = 19;
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.11 | 1013 |
| 1 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.11 | 1013 |
| 2 | NESTED LOOPS | | 1 | 103K| 100K|00:00:00.07 | 1011 |
| 3 | TABLE ACCESS FULL | T2 | 1 | 103K| 100K|00:00:00.01 | 1006 |
|* 4 | INDEX RANGE SCAN | T1_N | 100K| 1 | 100K|00:00:00.04 | 5 |
|* 5 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 1 |00:00:00.02 | 2 |
-----------------------------------------------------------------------------------------------
--盡管兩個表都建立了索引,但是如果大表驅動小表,反而消耗更大。
--總結:
--驅動表和被驅動表考慮創建索引
--確保小表驅動大表
HASH連接優化實驗
--刪除索引drop index t1_n;drop index t2_t1_id;select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 1016 | | | |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.02 | 1016 | 1000K| 1000K| 407K (0)|
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1 | 1 |00:00:00.01 | 7 | | | |
| 3 | TABLE ACCESS FULL| T2 | 1 | 103K| 100K|00:00:00.01 | 1006 | | | |
------------------------------------------------------------------------------------------------------------------創建索引
create index idx_t1_n on t1(n);--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.06 | 1008 | | | |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.06 | 1008 | 1000K| 1000K| 402K (0)|
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 3 | INDEX RANGE SCAN | IDX_T1_N | 1 | 1 | 1 |00:00:00.01 | 1 | | | |
| 4 | TABLE ACCESS FULL | T2 | 1 | 100K| 100K|00:00:00.02 | 1006 | | | |
----------------------------------------------------------------------------------------------------------------------------------------總結
--驅動表和被驅動表考慮創建索引
--確保小表驅動大表
--由于HASH操作需要額外內存區域(hash area),如果需要做hash運算的數據太多,則會用的臨時表空間,涉及磁盤IO會大大降低性能,索引盡量保證hash運算在內存中完成。
MS連接優化實驗
--Merge Sort Join兩表限制條件皆無索引SQL> SELECT /*+ leading(t2) use_merge(t1)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id4 and t1.n=19;---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.06 | 1007 | | | |
| 1 | MERGE JOIN | | 1 | 1 | 1 |00:00:00.06 | 1007 | | | |
| 2 | SORT JOIN | | 1 | 100K| 20 |00:00:00.06 | 1005 | 9762K| 1209K| 8677K (0)|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 100K|00:00:00.01 | 1005 | | | |
|* 4 | SORT JOIN | | 20 | 1 | 1 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 6 | INDEX RANGE SCAN | IDX_T1_N | 1 | 1 | 1 |00:00:00.01 | 1 | | | |
--------------------------------------------------------------------------------------------------------------------限制條件有索引
create index idx_t1_n on t1(n);
create index idx_t2_n on t2(n);
SQL> SELECT /*+ leading(t2) use_merge(t1)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id4 and t1.n=195 and t2.n=12;SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 5 | | | |
| 1 | MERGE JOIN | | 1 | 1 | 0 |00:00:00.01 | 5 | | | |
| 2 | SORT JOIN | | 1 | 1 | 1 |00:00:00.01 | 3 | 2048 | 2048 | 2048 (0)|
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
|* 4 | INDEX RANGE SCAN | IDX_T2_N | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 5 | SORT JOIN | | 1 | 1 | 0 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 7 | INDEX RANGE SCAN | IDX_T1_N | 1 | 1 | 1 |00:00:00.01 | 1 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------連接條件創建索引
set autotrace traceonly
SQL> SELECT /*+ leading(t1) use_merge(t2)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id;
Statistics
----------------------------------------------------------1 recursive calls0 db block gets1012 consistent gets0 physical reads0 redo size13999 bytes sent via SQL*Net to client673 bytes received via SQL*Net from client8 SQL*Net roundtrips to/from client2 sorts (memory)0 sorts (disk)100 rows processedCREATE INDEX idx_t1_id ON t1(id);
CREATE INDEX idx_t2_t1_id ON t2(t1_id);
set autotrace traceonly
SQL> SELECT /*+ leading(t1) use_merge(t2)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id;100 rows selected.Statistics
----------------------------------------------------------1 recursive calls0 db block gets1021 consistent gets0 physical reads0 redo size13999 bytes sent via SQL*Net to client673 bytes received via SQL*Net from client8 SQL*Net roundtrips to/from client1 sorts (memory)0 sorts (disk)100 rows processed