PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (遞歸查詢變態優化) 非驅動列索引掃描優化...

標簽

PostgreSQL , Oracle , index skip scan , 非驅動列條件 , 遞歸查詢 , 子樹


背景

對于輸入條件在復合索引中為非驅動列的,如何高效的利用索引掃描?

在Oracle中可以使用index skip scan來實現這類CASE的高效掃描:

INDEX跳躍掃描一般用在WHERE條件里面沒有使用到引導列,但是用到了引導列以外的其他列,并且引導列的DISTINCT值較少的情況。

在這種情況下,數據庫把這個復合索引邏輯上拆散為多個子索引,依次搜索子索引中非引導列的WHERE條件里面的值。

使用方法如下:

/*+ INDEX_SS ( [ @ qb_name ] tablespec [ indexspec [ indexspec ]... ] ) */  

The INDEX_SS hint instructs the optimizer to perform an index skip scan for the specified table. If the statement uses an index range scan, then Oracle scans the index entries in ascending order of their indexed values. In a partitioned index, the results are in ascending order within each partition.Each parameter serves the same purpose as in "INDEX Hint". For example:

SELECT /*+ INDEX_SS(e emp_name_ix) */ last_name FROM employees e WHERE first_name = 'Steven';  

下面是來自ORACLE PERFORMANCE TUNING里的原文:

Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.

Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.

The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.

Example 13-5 Index Skip Scan

Consider, for example, a table

employees(  
sex,   
employee_id,  
address  
)   

with a composite index on

(sex, employee_id).   

Splitting this composite index would result in two logical subindexes, one for M and one for F.

For this example, suppose you have the following index data:

('F',98)('F',100)('F',102)('F',104)('M',101)('M',103)('M',105)  

The index is split logically into the following two subindexes:

The first subindex has the keys with the value F.

The second subindex has the keys with the value M

pic

The column sex is skipped in the following query:

SELECT * FROM employeesWHERE employee_id = 101;  

A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.

PostgreSQL 非skip scan

PostgreSQL支持非驅動列的索引掃描,但是需要掃描整個索引。

例子

1、創建測試表

postgres=# create table t(id int, c1 int);  
CREATE TABLE  

2、寫入1000萬測試數據

postgres=# insert into t select random()*1 , id from generate_series(1,10000000) id;  
INSERT 0 10000000  

3、創建多列索引

postgres=# create index idx_t on t(id,c1);  
CREATE INDEX  

4、非驅動列查詢測試如下

index only scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  Index Only Scan using idx_t on public.t  (cost=10000000000.43..10000105164.89 rows=1 width=8) (actual time=0.043..152.288 rows=1 loops=1)  Output: id, c1  Index Cond: (t.c1 = 1)  Heap Fetches: 0  Buffers: shared hit=27326  Execution time: 152.328 ms  
(6 rows)  

index scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  Index Scan using idx_t on public.t  (cost=0.43..105165.99 rows=1 width=8) (actual time=0.022..151.845 rows=1 loops=1)  Output: id, c1  Index Cond: (t.c1 = 1)  Buffers: shared hit=27326  Execution time: 151.881 ms  
(5 rows)  

bitmap scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  Bitmap Heap Scan on public.t  (cost=105164.88..105166.00 rows=1 width=8) (actual time=151.731..151.732 rows=1 loops=1)  Output: id, c1  Recheck Cond: (t.c1 = 1)  Heap Blocks: exact=1  Buffers: shared hit=27326  ->  Bitmap Index Scan on idx_t  (cost=0.00..105164.88 rows=1 width=0) (actual time=151.721..151.721 rows=1 loops=1)  Index Cond: (t.c1 = 1)  Buffers: shared hit=27325  Execution time: 151.777 ms  
(9 rows)  

seq scan(全表掃描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------  Seq Scan on public.t  (cost=0.00..169248.41 rows=1 width=8) (actual time=0.014..594.535 rows=1 loops=1)  Output: id, c1  Filter: (t.c1 = 1)  Rows Removed by Filter: 9999999  Buffers: shared hit=44248  Execution time: 594.568 ms  
(6 rows)  

使用索引掃,因為不需要FILTER,同時掃描的BLOCK更少,所以性能比全表掃略好。但是還是掃了整個索引的PAGE,所以并不能算skip scan。

那么如何讓PostgreSQL支持index skip scan呢?

PostgreSQL skip scan

實際上原理和Oracle類似,可以輸入驅動列條件,然后按多個條件掃描,這樣就能達到SKIP SCAN的效果。(即多顆子樹掃描)。

同樣也更加適合于驅動列DISTINCT值較少的情況。

用PostgreSQL的遞歸查詢語法可以實現這樣的加速效果。這種方法也被用于獲取count(distinct), distinct值等。

《distinct xx和count(distinct xx)的變態遞歸優化方法 - 索引收斂(skip scan)掃描》

例如,我們通過這個方法,可以快速的得到驅動列的唯一值

with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 這里的where s.id is not null 一定要加,否則就死循環了.    
)     
select id from skip ;  

然后封裝到如下SQL,實現skip scan的效果

explain (analyze,verbose,timing,costs,buffers) select * from t where id in  
(  
with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 這里的where s.id is not null 一定要加,否則就死循環了.    
)     
select id from skip   
) and c1=1  
union all   
select * from t where id is null and c1=1;  

或者

explain (analyze,verbose,timing,costs,buffers) select * from t where id = any(array  
(  
with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 這里的where s.id is not null 一定要加,否則就死循環了.    
)     
select id from skip   
)) and c1=1  
union all   
select * from t where id is null and c1=1;  

看執行計劃:

效果好多了

  QUERY PLAN                                                                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  Append  (cost=55.00..215.22 rows=2 width=8) (actual time=0.127..0.138 rows=1 loops=1)  Buffers: shared hit=21  ->  Nested Loop  (cost=55.00..213.64 rows=1 width=8) (actual time=0.126..0.127 rows=1 loops=1)  Output: t.id, t.c1  Buffers: shared hit=18  ->  HashAggregate  (cost=54.57..55.58 rows=101 width=4) (actual time=0.108..0.109 rows=3 loops=1)  Output: skip.id  Group Key: skip.id  Buffers: shared hit=11  ->  CTE Scan on skip  (cost=51.29..53.31 rows=101 width=4) (actual time=0.052..0.102 rows=3 loops=1)  Output: skip.id  Buffers: shared hit=11  CTE skip  ->  Recursive Union  (cost=0.46..51.29 rows=101 width=4) (actual time=0.050..0.099 rows=3 loops=1)  Buffers: shared hit=11  ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.049..0.049 rows=1 loops=1)  Output: $1  Buffers: shared hit=4  InitPlan 3 (returns $1)  ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.045..0.046 rows=1 loops=1)  Output: t_3.id  Buffers: shared hit=4  ->  Index Only Scan using idx_t on public.t t_3  (cost=0.43..205165.21 rows=10000033 width=4) (actual time=0.045..0.045 rows=1 loops=1)  Output: t_3.id  Index Cond: (t_3.id IS NOT NULL)  Heap Fetches: 0  Buffers: shared hit=4  ->  WorkTable Scan on skip s  (cost=0.00..4.88 rows=10 width=4) (actual time=0.015..0.015 rows=1 loops=3)  Output: (SubPlan 2)  Filter: (s.id IS NOT NULL)  Rows Removed by Filter: 0  Buffers: shared hit=7  SubPlan 2  ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=2)  Output: $3  Buffers: shared hit=7  InitPlan 1 (returns $3)  ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=2)  Output: t_2.id  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t t_2  (cost=0.43..76722.42 rows=3333344 width=4) (actual time=0.017..0.017 rows=0 loops=2)  Output: t_2.id  Index Cond: ((t_2.id > s.id) AND (t_2.id IS NOT NULL))  Heap Fetches: 0  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t  (cost=0.43..1.56 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3)  Output: t.id, t.c1  Index Cond: ((t.id = skip.id) AND (t.c1 = 1))  Heap Fetches: 0  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t t_1  (cost=0.43..1.56 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)  Output: t_1.id, t_1.c1  Index Cond: ((t_1.id IS NULL) AND (t_1.c1 = 1))  Heap Fetches: 0  Buffers: shared hit=3  Execution time: 0.256 ms  
(56 rows)  

從150多毫秒,降低到了0.256毫秒

內核層面優化

與Oracle做法類似,或者說與遞歸的做法類似。

使用這種方法來改進優化器,可以達到index skip scan的效果,而且不用改寫SQL。

參考

《distinct xx和count(distinct xx)的變態遞歸優化方法 - 索引收斂(skip scan)掃描》

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/254850.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/254850.shtml
英文地址,請注明出處:http://en.pswp.cn/news/254850.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

如何確定鏡頭CCD靶面尺寸?

在組建機器視覺系統時,需要選用適合實際應用的產品。今天,中國機器視覺商城的培訓課堂為您帶來的是關于工業鏡頭CCD靶面尺寸的確定方法。 在選擇鏡頭時,我們通常要注意一個原則:即小尺寸靶面的CCD可使用對應規格更大的鏡頭&#x…

lua去掉字符串中的UTF-8的BOM三個字節

廢話不多說,還是先說點吧,項目中lua讀取的text文件如果有BOM,客戶端解析就會報錯,所以我看了看,任務編輯器swGameTaskEditor 在寫入文件的時候,也不知道為什么有的文件就是UTF-8BOM格式;但一般都…

JQuery對象與DOM對象的區別與轉換

1.jQuery對象和DOM對象的區別 DOM對象,即是我們用傳統的方法(javascript)獲得的對象,jQuery對象即是用jQuery類庫的選擇器獲得的對象; eg: var domObj document.getElementById("id"); //DOM對象var $obj $("#id"); //jQuery對象;…

halcon append_ocr_trainf 將字符添加到訓練文件中

目錄append_ocr_trainf(算子)描述參數append_ocr_trainf(算子) append_ocr_trainf - 將字符添加到訓練文件中。 append_ocr_trainf(Character,Image :: Class,TrainingFile ? 描述 運算符a…

CCD 尺寸

CCD(包括CMOS感光元件)的面積是按其矩形對角線英寸長度為指標的。這和定義電視屏幕尺寸類似。一英寸是25.4毫米。1/2.0英寸、1/1.8都是指CCD 對角線有多少分之一英寸長,分母小的其分數值就大,相應感光元件面積也大。 1/2.…

Quagga的安裝碰到的問題

1.如果出現以下錯誤: vtysh: symbol lookup error: /usr/local/lib/libreadline.so.6: undefined symbol: UP 解決方法如下: 1.rootlocalhost:~ # cd /usr/local/lib 2.rootlocalhost:/usr/local/lib# ls -la libreadline* 3.rootlocalhost:/usr/local/lib# mkd…

X264電影壓縮率畫質

X264電影壓縮率畫質全對比: http://www.mov8.com/dvd/freetalk_show.asp?id29778

halcon read_ocr_trainf 從文件中讀取訓練字符并轉換為圖像

目錄read_ocr_trainf(算子)描述參數read_ocr_trainf(算子) read_ocr_trainf - 從文件中讀取訓練字符并轉換為圖像。 read_ocr_trainf(:Characters:TrainingFile:CharacterNames&am…

(十二)洞悉linux下的Netfilteramp;iptables:iptables命令行工具源碼解析【下】

iptables用戶空間和內核空間的交互 iptables目前已經支持IPv4和IPv6兩個版本了,因此它在實現上也需要同時兼容這兩個版本。iptables-1.4.0在這方面做了很好的設計,主要是由libiptc庫來實現。libiptc是iptables control library的簡稱,是Netfi…

Linux 下實現普通用戶只能寫入某個目錄

今天老婆問了我一個問題:如何在linux 下實現某個目錄普通用戶能夠寫入文件,但是不能刪除或修改(只能由root 刪除或修改)。開始的兩分鐘里,我初步判斷這是做不到的,因為linux 下能 寫入(w&#x…

CCD和CMOS攝像頭成像原理以及其他區別

CCD的第二層是分色濾色片,目前有兩種分色方式,一是RGB原色分色法,另一個則是CMYG補色分色法,這兩種方法各有利弊。不過以產量來看,原色和補色CCD的比例大約在2:1左右。原色CCD的優…

FFMPEG分析比較細的文章

http://blog.csdn.net/ym012/article/details/6538301

恢復Ext3下被刪除的文件(轉)

前言 下面是這個教程將教你如何在Ext3的文件系統中恢復被rm掉的文件。 刪除文件 假設我們有一個文件名叫 ‘test.txt’ $ls -il test.txt15 -rw-rw-r– 2 root root 20 Apr 17 12:08 test.txt 注意:: “-il” 選項表示顯示文件的i-node號(15)…

halcon trainf_ocr_class_svm 訓練OCR分類器

目錄trainf_ocr_class_svm(算子)描述參數trainf_ocr_class_svm(算子) trainf_ocr_class_svm - 訓練OCR分類器。 trainf_ocr_class_svm(:: OCRHandle,TrainingFile,Epsilon,TrainMo…

Javascript之全局變量和局部變量部分講解

以此文作為自己學習的一個總結。 關于全局變量和局部變量的一句簡單的定義:在函數外聲明的變量都為全局變量,在函數內聲明的為局部變量。 一、局部變量和全局變量重名會覆蓋全局變量 1 var a 1; 2 function test1() { 3 var a 2; 4 ale…

XML-RPC使用手冊

內容列表 Preface: About This Manual Introduction to XML-RPC for C/C What is XML-RPC? How Does XML-RPC For C/C Help? More Information On XML-RPC For C/CThe Xmlrpc-c Function Libraries C Libraries C LibrariesUtility Programs xmlrpc xmlrpc_dumpserverAlterna…

利用ffmpeg來進行視頻解碼的完整示例代碼(H.264)

Decode() { FILE * inpf; int nWrite; int i,p; int nalLen; unsigned char* Buf; int got_picture, consumed_bytes; unsigned char *DisplayBuf; DisplayBuf(unsigned char *)malloc(60000); char outfile[] "test.pgm"; //1.打開輸入文件 inpf fopen("test…

如何成為非標行業的大拿

1,選一個好的舞臺(工作環境),有個廣告詞叫:‘心有多大,舞臺就有多大’,我想變個說法叫‘舞臺越大,心就越大’。決定你表演效果的舞臺,你如果選擇…

TCP UDP HTTP 的關系和區別

TCP UDP HTTP 三者的關系: TCP/IP是個協議組,可分為四個層次:網絡接口層、網絡層、傳輸層和應用層。 在網絡層有IP協議、ICMP協議、ARP協議、RARP協議和BOOTP協議。 在傳輸層中有TCP協議與UDP協議。 在應用層有HTTP、FTP、TELNET、SMTP、DNS等協議。 TCP…

微信開放平臺全網發布時,檢測失敗 —— C#

主要就是三個:返回API文本消息,返回普通文本消息,發送事件消息 --會出現失敗的情況 (后續補充說明:出現檢測出錯,不一定是代碼出現了問題,也有可能是1.微信方面檢測時出現服務器請求失敗&…