深入理解Oracle的并行操作【好文認真讀】

請尊重原文作者，http://czmmiao.iteye.com/blog/1487568

--------------------------------------------------------------------------------

并行（Parallel）和OLAP系統?
并行的實現機制是：首先，Oracle會創建一個進程用于協調并行服務進程之間的信息傳遞，這個協調進程將需要操作的數據集（比如表的數據塊）分割成很多部分，稱為并行處理單元，然后并行協調進程給每個并行進程分配一個數據單元。比如有四個并行服務進程，他們就會同時處理各自分配的單元，當一個并行服務進程處理完畢后，協調進程就會給它們分配另外的單元，如此反復，直到表上的數據都處理完畢，最后協調進程負責將每個小的集合合并為一個大集合作為最終的執行結果，返回給用戶。
并行處理的機制實際上就是把一個要掃描的數據集分成很多小數據集，Oracle會啟動幾個并行服務進程同時處理這些小數據集，最后將這些結果匯總，作為最終的處理結果返回給用戶。?
這種數據并行處理方式在OLAP系統中非常有用，OLAP系統的表通常來說都是非常大，如果系統的CPU比較多，讓所有的CPU共同來處理這些數據，效果就會比串行執行要高得多。?
然而對于OLTP系統，通常來講，并行并不合適，原因是OLTP系統上幾乎在所有的SQL操作中，數據訪問路徑基本上以索引訪問為主，并且返回結果集非常小，這樣的SQL操作的處理速度一般非常快，不需要啟用并行。?
并行處理的機制?
當Oracle數據庫啟動的時候，實例會根據初始化參數 PARALLEL_MIN_SERVERS=n的值來預先分配n個并行服務進程，當一條SQL被CBO判斷為需要并行執行時發出SQL的會話進程變成并行協助進程，它按照并行執行度的值來分配進程服務器進程。
首先協調進程會使用ORACLE啟動時根據參數： parallel_min_servers=n的值啟動相應的并行服務進程，如果啟動的并行服務器進程數不足以滿足并行度要求的并行服務進程數，則并行協調進程將額外啟動并行服務進程以提供更多的并行服務進程來滿足執行的需求。然后并行協調進程將要處理的對象劃分成小數據片，分給并行服務進程處理；并行服務進程處理完畢后將結果發送給并行協調進程，然后由并行協調進程將處理結果匯總并發送給用戶。?
剛才講述的是一個并行處理的基本流程。實際上，在一個并行執行的過程中，還存在著并行服務進程之間的通信問題。?
在一個并行服務進程需要做兩件事情的時候，它會再啟用一個進程來配合當前的進程完成一個工作，比如這樣的一條SQL語句：?
Select * from employees order by last_name;?
假設employees表中last_name列上沒有索引，并且并行度為4，此時并行協調進程會分配4個并行服務進程對表employees進行全表掃描操作，因為需要對結果集進行排序，所以并行協調進程會額外啟用4個并行服務進程，用于處理4個進程傳送過來的數據，這新啟用的用戶處理傳遞過來數據的進程稱為父進程，用戶傳出數據（最初的4個并行服務進程）稱為子進程，這樣整個并行處理過程就啟用了8個并行服務進程。其中每個單獨的并行服務進程的行為叫作并行的內部操作，而并行服務進程之間的數據交流叫做并行的交互操作。?
這也是有時我們發現并行服務進程數量是并行度的2倍，就是因為啟動了并行服務父進程操作的緣故。?
讀懂一個并行處理的執行計劃?
CREATE TABLE emp2 AS SELECT * FROM employees;?
ALTER TABLE emp2 PARALLEL 2;?
EXPLAIN PLAN FOR?SELECT SUM(salary) FROM emp2 GROUP BY department_id;
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------
Plan hash value: 3939201228
------------------------------------------------------------------------------------------------------------------
| Id? | Operation??????????????? | Name???? | Rows? | Bytes | Cost (%CPU)| Time???? |??? TQ? |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|?? 0 | SELECT STATEMENT???????? |????????? |?? 107 |? 2782 |???? 3? (34)| 00:00:01 |??????? |????? |??????????? |
|?? 1 |? PX COORDINATOR????????? |????????? |?????? |?????? |??????????? |????????? |??????? |????? |??????????? |
|?? 2 |?? PX SEND QC (RANDOM)??? | :TQ10001 |?? 107 |? 2782 |???? 3? (34)| 00:00:01 |? Q1,01 | P->S | QC (RAND)? |
|?? 3 |??? HASH GROUP BY???????? |????????? |?? 107 |? 2782 |???? 3? (34)| 00:00:01 |? Q1,01 | PCWP |??????????? |
|?? 4 |???? PX RECEIVE?????????? |????????? |?? 107 |? 2782 |???? 3? (34)| 00:00:01 |? Q1,01 | PCWP |??????????? |
|?? 5 |????? PX SEND HASH??????? | :TQ10000 |?? 107 |? 2782 |???? 3? (34)| 00:00:01 |? Q1,00 | P->P | HASH?????? |
|?? 6 |?????? HASH GROUP BY????? |????????? |?? 107 |? 2782 |???? 3? (34)| 00:00:01 |? Q1,00 | PCWP |??????????? |
|?? 7 |??????? PX BLOCK ITERATOR |????????? |?? 107 |? 2782 |???? 2?? (0)| 00:00:01 |? Q1,00 | PCWC |??????????? |
|?? 8 |???????? TABLE ACCESS FULL| EMP2???? |?? 107 |? 2782 |???? 2?? (0)| 00:00:01 |? Q1,00 | PCWP |??????????? |
------------------------------------------------------------------------------------------------------------------
Note
-----
?? - dynamic sampling used for this statement
19 rows selected.?
通過執行計劃，我們來看一下它的執行步驟：?
1、并行服務進程對EMP2表進行全表掃描。?
2、并行服務進程以ITERATOR（迭代）方式訪問數據塊，也就是并行協調進程分給每個并行服務進程一個數據片，在這個數據片上，并行服務進程順序地訪問每個數據塊（Iterator），所有的并行服務進程將掃描的數據塊傳給另一組并行服務進程（父進程）用于做Hash Group操作。?
3、并行服務父進程對子進程傳遞過來的數據做Hash Group操作。?
4、并行服務進程（子進程）將處理完的數據發送出去。?
5、并行服務進程（父進程）接收到處理過的數據。?
6、合并處理過的數據，按照隨機的順序發給并行協調進程（QC：Query Conordinator）。?
7、并行協調進程將處理結果發給用戶。?
當使用了并行執行，SQL的執行計劃中就會多出一列：in-out。該列幫助我們理解數據流的執行方法。它的一些值的含義如下：?
Parallel to Serial（P->S）: 表示一個并行操作發送數據給一個串行操作
Parallel to Parallel（P->P）：表示一個并行操作向另一個并行操作發送數據
Parallel Combined with parent(PCWP): 同一個從屬進程執行的并行操作，同時父操作也是并行的。?
Parallel Combined with Child(PCWC): 同一個從屬進程執行的并行操作，子操作也是并行的。?
Serial to Parallel（S->P）: 一個串行操作發送數據給并行操作，如果select部分是串行操作，就會出現這個情況。?
并行執行等待事件?
在做并行執行方面的性能優化的時候，可能會遇到如下等待事件?
PX Deq Credit: send blkd?
這是一個有并行環境的數據庫中，從statspack 或者AWR中經常可以看到的等待事件。在Oracle 9i 里面，這個等待時間被列入空閑等待。?
一般來說空閑等待可以忽略它，但是實際上空閑等待也是需要關注的，因為一個空閑的等待，它反映的是另外的資源已經超負荷運行了。基于這個原因，在Oracle 10g里已經把PX Deq Credit: send blkd等待時間不再視為空閑等待，而是列入了Others 等待事件范圍。?
PX Deq Credit: send blkd 等待事件的意思是：當并行服務進程向并行協調進程QC（也可能是上一層的并行服務進程）發送消息時，同一時間只有一個并行服務進程可以向上層進程發送消息，這時候如果有其他的并行服務進程也要發送消息，就只能等待了。直到獲得一個發送消息的信用信息（Credit），這時候會觸發這個等待事件，這個等待事件的超時時間為2秒鐘。?
如果我們啟動了太多的并行進程，實際上系統資源（CPU）或者QC 無法即時處理并行服務發送的數據，那么等待將不可避免。對于這種情況，我們就需要降低并行處理的并行度。?
當出現PX Deq Credit：send blkd等待的時間很長時，我們可以通過平均等待時間來判斷等待事件是不是下層的并行服務進程空閑造成的。該等待事件的超時時間是2秒，如果平均等待時間也差不多是2秒，就說明是下層的并行進程“無事所做”，處于空閑狀態。如果和2秒的差距很大，就說明不是下層并行服務超時導致的空閑等待，而是并行服務之間的競爭導致的，因為這個平均等待事件非常短，說明并行服務進程在很短時間的等待之后就可以獲取資源來處理數據。?
所以對于非下層的并行進程造成的等待，解決的方法就是降低每個并行執行的并行度，比如對象（表，索引）上預設的并行度或者查詢Hint 指定的并行度。?
并行執行的使用范圍?
Parallel Query( 并行查詢 )
并行查詢可以在查詢語句，子查詢語句中使用，但是不可以使用在一個遠程引用的對象上（如DBLINK）。?
一個查詢能夠并行執行，需要滿足以下條件?
1、SQL語句中有Hint提示，比如Parallel或者Parallel_index。?
2、SQL語句中引用的對象被設置了并行屬性。?
3、多表關聯中，至少有一個表執行全表掃描（Full table scan）或者跨分區的Index range SCAN。如：
select /*+parallel(t 4)×/ * from t;?
Parallel DDL（并行DDL操作，如建表，建索引等）?
表的并行操作
CREATE TABLE table_name parallel 4 AS SELECT?....
ALTER TABLE table_name move partition?partition_name parallel 4;
Alter table table_name split partition?partition_name ...... parallel 4;?
Alter table?table_name?coalesce partition?parallel 4;?
DDL操作，我們可以通過trace 文件來查看它的執行過程。?
alter session set events '10046 trace name context forever,level 12';?
alter session set events '10046 trace name context off' ;?
創建索引的并行執行?
創建索引時使用并行方式在系統資源充足的時候會使性能得到很大的提高，特別是在OLAP系統上對一些很大的表創建索引時更是如此。以下的創建和更改索引的操作都可以使用并行：?
Create index?index_name on table_name(colum_name) parallel 4;
Alter index index_name rebuild?parallel 4
Alter index?index_name?rebuild partition?partition_name parallel 4;
Alter index?index_name?split partition?partition_name .... parallel 4;?
注意：索引上的并行度只有在訪問索引的時候才可以被使用。?
總結：?
使用并行方式，不論是創建表，修改表，創建索引，重建索引，他們的機制都是一樣的，那就是Oracle 給每個并行服務進程分配一塊空間，每個進程在自己的空間里處理數據，最后將處理完畢的數據匯總，完成SQL的操作。?
Parallel DML（并行DML操作，如insert，update，delete等）?
Oracle 可以對DML操作使用并行執行，但是有很多限制。如果我們要讓DML操作使用并行執行，必須顯示地在會話里執行如下命令：?
SQL> alter session enable parallel dml;?
會話已更改。?
只有執行了這個操作，Oracle 才會對之后符合并行條件的DML操作并行執行，如果沒有這個設定，即使SQL中指定了并行執行，Oracle也會忽略它。?
delete，update和merge樣例如下?
delete /*+ parallel(table_name 4) */ from test;
update/*+ parallel(table_name 4) */ test set id=100;
merge /*+ parallel(table_name 4) */ into table_name ...
Oracle 對Delete,update,merge的操作限制在，只有操作的對象是分區表示，Oracle才會啟動并行操作。原因在于，對于分區表，Oracle 會對每個分區啟用一個并行服務進程同時進行數據處理，這對于非分區表來說是沒有意義的。分區表的并行屬性只能在表級別設置，不能在分區級別設置。
注:經筆者測試中非分區表下，?Delete,update,merge也可以執行并行操作，不知道是版本原因還是其他原因，待考證?
Insert 的并行操作?
實際上只有對于insert into … select … 這樣的SQL語句啟用并行才有意義。對于insert into .. values… 并行沒有意義，因為這條語句本身就是一個單條記錄的操作。?
Insert 并行常用的語法是：?
Insert /*+parallel(t 2) */ into t select /*+parallel(t1 2) */ * from t1;?
這條SQL 語句中，可以讓兩個操作insert 和select 分別使用并行，這兩個并行是相互獨立，互補干涉的，也可以單獨使用其中的一個并行。也有如下語法
Insert into t select /*+parallel(t1 2) */ * from t1;?
Insert /*+parallel(t 2) */ into t select? * from t1;?
并行執行的設定?
并行相關的初始話參數?
parallel_min_servers=n?
在初始化參數中設置了這個值，Oracle 在啟動的時候就會預先啟動N個并行服務進程，當SQL執行并行操作時，并行協調進程首先根據并行度的值，在當前已經啟動的并行服務中條用n個并行服務進程，當并行度大于n時，Oracle將啟動額外的并行服務進程以滿足并行度要求的并行服務進程數量。?
parallel_max_servers=n?
如果并行度的值大于parallel_min_servers或者當前可用的并行服務進程不能滿足SQL的并行執行要求，Oracle將額外創建新的并行服務進程，當前實例總共啟動的并行服務進程不能超過這個參數的設定值。默認值為PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5
parallel_adaptive_multi_user=true|false?
Oracle 10g R2下，并行執行默認是啟用的。這個參數的默認值為true，它讓Oracle根據SQL執行時系統的負載情況，動態地調整SQL的并行度，以取得最好的SQL執行性能。

PARALLEL_DEGREE_POLICY
PARALLEL_DEGREE_POLICY specifies whether or not automatic degree of Parallelism, statement queuing, and in-memory parallel execution will be enabled.Values:

MANUAL:?Disables automatic degree of parallelism, statement queuing, and in-memory parallel execution. This reverts the behavior of parallel execution to what it was prior to Oracle Database 11g Release 2 (11.2). This is the default.
LIMITED:?Enables automatic degree of parallelism for some statements but statement queuing and in-memory Parallel Execution are disabled. Automatic degree of parallelism is only applied to those statements that access tables or indexes decorated explicitly with the DEFAULT degree of parallelism using the PARALLEL clause. Statements that do not access any tables or indexes decorated with the DEFAULT degree of parallelism will retain the MANUAL behavior.
AUTO: Enables automatic degree of parallelism, statement queuing, and in-memory parallel execution.

Note: If the table parallel degree has been set as "default",? sql would compute its own DOP regardless parallel_degree_policy is MANUAL or LIMITED. When parallel_degree_policy=AUTO, it would trigger automatically parallel degree?.
parallel_min_percent?
這個參數指定并行執行時，申請并行服務進程的最小值，它是一個百分比，比如我們設定這個值為50. 當一個SQL需要申請20個并行進程時，如果當前并行服務進程不足，按照這個參數的要求，這個SQL比如申請到20*50%=10個并行服務進程，如果不能夠申請到這個數量的并行服務，SQL將報出一個ORA-12827的錯誤。當這個值設為Null時，表示所有的SQL在做并行執行時，至少要獲得兩個并行服務進程。
Note:當parallel_degree_policy=AUTO時，如果沒有獲得申請的進程，則該并行sql則會進入隊列

PARALLEL_FORCE_LOCAL
This parameter specifies whether a SQL statement executed in parallel is restricted to a single instance in an Oracle RAC environment. By setting this parameter to TRUE, you restrict the scope of the parallel server processed to the single Oracle RAC instance where the query coordinator is running.
The recommended value for the PARALLEL_FORCE_LOCAL parameter is FALSE.
PARALLEL_DEGREE_POLICY
該參數為11g的新增參數，有如下3個屬性值：
manual，默認值，表示不自動調節并行度。
auto，自動調節并行度。
limited，對于在相關表或者索引上已經定義了并行度的查詢進行自動并行度調節，沒有在相關表或者索引上已經定義并行度的查詢不進行自動并行度調節。
PARALLEL_THREADS_PER_ CPU
一個CPU 在并行執行過程中可處理的進程或線程的數量，并優化并行自適應算法和負載均衡算法。如果計算機在執行一個典型查詢時有超負荷的跡象，應減小該數值為任何非零值。根據操作系統而定 (通常為 2)
parallel_automatic_tuning
如果設置為 TRUE，Oracle 將為控制并行執行的參數確定默認值。除了設置該參數外，你還必須為系統中的表設置并行性。以在10g中廢棄，只為向下兼容保留。

PARALLEL_MIN_TIME_THRESHOLD
This parameter specifies the minimum execution time a statement should have before the statement is considered for automatic degree of parallelism. By default, this is set to 10 seconds. Automatic degree of parallelism is only enabled if PARALLEL_DEGREE_POLICY is set to AUTO or LIMITED. The syntax is:
PARALLEL_MIN_TIME_THRESHOLD = { AUTO | integer }
The default is AUTO.
PARALLEL_SERVERS_TARGET
This parameter specifies the number of parallel server processes allowed to run parallel statements before statement queuing is used. The default value is:
PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 2
In the formula, the value assigned to concurrent_parallel_users running at the default degree of parallelism on an instance is dependent on the memory management setting. If automatic memory management is disabled (manual mode), then the value of concurrent_parallel_users is 1. If PGA automatic memory management is enabled, then the value of concurrent_parallel_users is 2. If global memory management or SGA memory target is used in addition to PGA automatic memory management, then the value of concurrent_parallel_users is 4.
When PARALLEL_DEGREE_POLICY is set to AUTO, statements that require parallel execution are queued if the number of parallel processes currently in use on the system equals or is greater than PARALLEL_SERVERS_TARGET. This is not the maximum number of parallel server processes allowed on a system (that is controlled by PARALLEL_MAX_SERVERS). However, PARALLEL_SERVERS_TARGET and parallel statement queuing is used to ensure that each statement that requires parallel execution is allocated the necessary parallel server resources and the system is not flooded with too many parallel server processes.
parallel_execution_message_size
指定并行執行 (并行查詢、PDML、并行恢復和復制) 消息的大小。如果值大于 2048 或 4096，就需要更大的共享池。如果 PARALLEL_AUTOMATIC_TUNING =TRUE，將在大存儲池之外指定消息緩沖區。如果PARALLEL_AUTOMATIC_TUNING 為 FALSE，通常值為 2148；如果PARALLEL_AUTOMATIC_TUNING 為 TRUE ，則值為 4096 (根據操作系統而定)。

PARALLEL_DEGREE_LIMIT
With automatic degree of parallelism, Oracle automatically decides whether or not a statement should execute in parallel and what degree of parallelism the statement should use. The optimizer automatically determines the degree of parallelism for a statement based on the resource requirements of the statement. However, the optimizer will limit the degree of parallelism used to ensure parallel server processes do not flood the system. This limit is enforced by PARALLEL_DEGREE_LIMIT
Values:

CPU: The maximum degree of parallelism is limited by the number of CPUs in the system. The formula used to calculate the limit is PARALLEL_THREADS_PER_CPU * CPU_COUNT * the number of instances available (by default, all the opened instances on the cluster but can be constrained using PARALLEL_INSTANCE_GROUP or service specification). This is the default.
IO:?The maximum degree of parallelism the optimizer can use is limited by the I/O capacity of the system. The value is calculated by dividing the total system throughput by the maximum I/O bandwidth per process. You must run the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure on the system in order to use the IO setting. This procedure will calculate the total system throughput and the maximum I/O bandwidth per process.
integer:?A numeric value for this parameter specifies the maximum degree of parallelism the optimizer can choose for a SQL statement when automatic degree of parallelism is active. Automatic degree of parallelism is only enabled if PARALLEL_DEGREE_POLICY is set to AUTO or LIMITED.

并行度可以通過以下三種方式來設定：?
1、使用Hint 指定并行度。?
2、使用alter session force parallel 設定并行度。?
3、使用SQL中引用的表或者索引上設定的并行度，原則上Oracle 使用這些對象中并行度最高的那個值作為當前執行的并行度。?
示例：?
SQL>Select /*+parallel(t 4) */ count(*) from t;?
SQL>Alter table t parallel 4;?
SQL>Alter session force parallel query parallel 4;?
SQL>alter session force parallel query；?
Oracle 默認并行度計算方式：?
1、Oracle 根據CPU的個數，RAC實例的個數以及參數parallel_threads_per_cpu的值，計算出一個并行度。?
2、對于并行訪問分區操作，取需要訪問的分區數為并行度。?
并行度的優先級別從高到低：?
Hint->alter session force parallel->表，索引上的設定-> 系統參數?
實際上，并行只有才系統資源比較充足的情況下，才會取得很好的性能，如果系統負擔很重，不恰當的設置并行，反而會使性能大幅下降。?
如何啟用并行

可以用hint、alter session或者設置對象并行屬性三種方式設置啟用并行。三種方式任意一種就可以使并行生效，如果多種方式同時存在的話，則優先級順序是：hint -> alter session -> table/index degree。

alter session force query parallel 8;

alter session enable parallel dml;

alter session force parallel dml parallel 8;

alter session force parallel ddl parallel 8;

注意：上述的alter session enable只是表示讓當前會話支持并行，最終并行需要通過hint或者table/index degree來實現；而alter session force表示強制并行，無需hint等配合使用。

11g中的并行?
在Oracle 11g R2以及之前的版本，你的SQL執行可能被延遲，直到有充足的并行服務器可用。

Oracle有一種輪換IO機制，叫做“直接路徑IO”，如果它判斷到繞過緩存區直接執行IO會更快速的話，它就會啟用。例如，Oracle在讀寫臨時段進行排序或者整理中間結果集時就會使用直接IO。從Oracle 11g開始，Oracle有時也優先利用直接路徑IO來處理串行表訪問，而不是正常的緩存IO。

在執行并行查詢操作時，Oracle通常會使用直接路徑IO。通過使用直接路徑IO，Oracle可以避免創建高速緩存競爭，并可以使IO更加優化地在并行進程之間分配。此外，對于執行全表掃描的并行操作，在高速緩存找到匹配數據的機會相當低，因此高速緩存幾乎沒有增加什么價值。

在Oracle 10g以及更早的版本，并行查詢總是使用直接路徑IO，而串行查詢將總是使用緩存IO。在11g中，Oracle可以對并行查詢利用緩存IO(從11g R2以后的版本支持)，而且串行查詢也可能利用直接路徑IO。然而，并行查詢仍然不太可能利用緩存IO，因此，可能比串行查詢需要更高的IO成本。當然，更高的IO成本將在所有并行進程之間共享，這樣整體性能仍可能更勝一籌。

并行的優化準則?
從優化串行執行的SQL開始?
一個最理想的并行計劃與最佳的串行計劃可能是有區別的。例如，并行處理通常從表或索引的掃描開始，而最佳串行計劃可能是基于索引查找開始。然而，你應該確保你的查詢在進行并行優化之前先對串行執行進行優化，原因如下：
1、串行調試的結構和方法主要針對索引和統計集合，而這些經常對好的并行調試也非常關鍵。
2、如果請求并行執行的資源不可用，你的查詢可能被串行化(這取決于“PARALLEL_DEGREE_ POLICY”和“PARALLEL_MIN_PERCENT”的設置)。在這種情況下，你要確保你并行查詢的串行計劃足夠好。
缺少調優的SQL甚至可能變成更差的SQL，至少考慮到對其他用戶的影響時是這樣，這使它被允許消耗數據庫服務器更多的CPU和IO資源。
在為并行執行優化SQL語句時，要從未串行執行SQL優化開始。
確保該SQL是合適的并行執行SQL?
不是每個SQL都能從并行執行中獲益的。下面是一些例子，這些情況的SQL語句可能不應該被并行化。
1、串行執行時，執行時間很短的SQL語句。
2、可能在多個會話中高并發率運行的SQL語句。
3、基于索引查找的SQL語句。非并行的索引查找或者范圍掃描不能被并行化。然而，索引全掃描可以被并行化。在分區索引上的索引查找也可以被并行化。
綜上3點，OLTP類型的查詢通常不適合并行化處理。
確保系統適合配置為并行執行?
不是所有的SQL都適合并行執行，也不是所有的數據庫服務器主機適合配置并行處理。在當今世界，大部分物理服務器主機都滿足如下最小需求：多塊CPU和跨多個物理驅動器的數據帶。然而，一些虛擬主機可能不滿足這些最小需求，而桌面計算機通常只有唯一的磁盤設備，因此通常不適合調整為并行執行。
不要嘗試在那些不滿足最小需求(多塊CPU和跨多個磁盤驅動器的數據帶)的計算機系統上使用并行執行。
確保執行計劃的所有部分都被并行化了?
在復雜的并行SQL語句中，很重要的一點是要確保該查詢執行的所有重要步驟都實現了并行。如果某復雜查詢的其中一個步驟是串行執行的，其他并行步驟可能也不得不等待該串行步驟完成，這樣并行機制的優勢就完全喪失了。“PLAN_TABLE”表中的“OTHER_TAG”列用 “PARALLEL_FROM_SERIAL”標記指定了這樣一個步驟，“DBMS_XPLAN”在“IN-OUT”列中記錄了“S->P”。例如：在下面的例子中表“T1”是并行化的，但是表“T”不是。對兩個表的連接和“GROUP BY”包括許多并行操作，但是對“T”表的全表掃描不是并行化的，串到并(S->P)標記展示了“t”行被串行提取到后續并行操作中：
SQL> ALTER TABLE customers PARALLEL(DEGREE 4);
SQL> ALTER TABLE sales NOPARALLEL ;
SQL> explain plan for select /*+ ordered use_hash(t1) */ t.name,sum(t.id) from t,t1 where t.id=t1.object_id group by t.name;
SQL> ? 1* select * from table(dbms_xplan.display)
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2575143521
----------------------------------------------------------------------------------------------------------------------
| Id? | Operation??????????????????? | Name???? | Rows? | Bytes | Cost (%CPU)| Time???? |??? TQ? |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|?? 0 | SELECT STATEMENT???????????? |????????? |??? 61 |? 4758 |??? 22? (10)| 00:00:01 |??????? |????? |??????????? |
|?? 1 |? PX COORDINATOR????????????? |????????? |?????? |?????? |??????????? |????????? |??????? |????? |??????????? |
|?? 2 |?? PX SEND QC (RANDOM)??????? | :TQ10002 |??? 61 |? 4758 |??? 22? (10)| 00:00:01 |? Q1,02 | P->S | QC (RAND)? |
|?? 3 |??? HASH GROUP BY???????????? |????????? |??? 61 |? 4758 |??? 22? (10)| 00:00:01 |? Q1,02 | PCWP |??????????? |
|?? 4 |???? PX RECEIVE?????????????? |????????? |??? 61 |? 4758 |??? 22? (10)| 00:00:01 |? Q1,02 | PCWP |??????????? |
|?? 5 |????? PX SEND HASH??????????? | :TQ10001 |??? 61 |? 4758 |??? 22? (10)| 00:00:01 |? Q1,01 | P->P | HASH?????? |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|?? 6 |?????? HASH GROUP BY????????? |????????? |??? 61 |? 4758 |??? 22? (10)| 00:00:01 |? Q1,01 | PCWP |??????????? |
|*? 7 |??????? HASH JOIN???????????? |????????? |??? 61 |? 4758 |??? 21?? (5)| 00:00:01 |? Q1,01 | PCWP |??????????? |
|?? 8 |???????? BUFFER SORT????????? |????????? |?????? |?????? |??????????? |????????? |? Q1,01 | PCWC |??????????? |
|?? 9 |????????? PX RECEIVE????????? |????????? |??? 61 |? 3965 |??? 12?? (0)| 00:00:01 |? Q1,01 | PCWP |??????????? |
|? 10 |?????????? PX SEND BROADCAST? | :TQ10000 |??? 61 |? 3965 |??? 12?? (0)| 00:00:01 |??????? | S->P | BROADCAST? |
|? 11 |??????????? TABLE ACCESS FULL | T??????? |??? 61 |? 3965 |??? 12?? (0)| 00:00:01 |??????? |????? |??????????? |
|? 12 |???????? PX BLOCK ITERATOR??? |????????? | 52078 |?? 661K|???? 8?? (0)| 00:00:01 |? Q1,01 | PCWC |??????????? |
|? 13 |????????? INDEX FAST FULL SCAN| T1_IDX?? | 52078 |?? 661K|???? 8?? (0)| 00:00:01 |? Q1,01 | PCWP |??????????? |
----------------------------------------------------------------------------------------------------------------------
像前面這種情況，部分并行化執行計劃可能會導致兩方面效果都很差：消耗的時間并沒有改善，因為串行操作形成了整個執行的瓶頸。然而，該SQL還捆綁了并行服務器進程，而且可能影響其他并發執行SQL的性能。
如果我們為表“t”設置一個默認的并行度，該串行瓶頸將會消失。對“t”表的全掃描現在是按并行執行了，而且“串到并S->P”瓶頸將被全并行的“并到并P->P”操作替代
在優化并行執行計劃時，要確保所有相關步驟都在并行執行：“DBMS_XPLAN”中的串到并S->P 標記或者“PLAN_TABLE”中的“PARALLEL_FROM_SERIAL”通常指示在并行計劃的某些方面存在串行瓶頸。
Monitor parallel execution
SQL> select * from v$pq_sesstat;
STATISTIC????????????????????? LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized??????????????????? 1??????????? 13
DML Parallelized??????????????????????? 0???????????? 0
DDL Parallelized??????????????????????? 0???????????? 0
DFO Trees?????????????????????????????? 1??????????? 13
Server Threads????????????????????????? 6???????????? 0
Allocation Height?????????????????????? 3???????????? 0
Allocation Width??????????????????????? 1???????????? 0
Local Msgs Sent?????????????????????? 362??????? 303740
Distr Msgs Sent???????????????????????? 0???????????? 0
Local Msgs Recv'd???????????????????? 368??????? 303782
Distr Msgs Recv'd?????????????????????? 0???????????? 0
11 rows selected.
SQL> select? DFO_NUMBER, TQ_ID, SERVER_TYPE,? NUM_ROWS ,BYTES,process from v$pq_tqstat order by dfo_number , tq_id , server_type;
DFO_NUMBER????? TQ_ID SERVER_TYPE?????? NUM_ROWS????? BYTES PROCESS
---------- ---------- --------------- ---------- ---------- --------
???????? 1????????? 0 Consumer???????????? 17151???? 100454 P001
???????? 1????????? 0 Consumer???????????? 17242???? 100969 P002
???????? 1????????? 0 Consumer???????????? 17257???? 101058 P000
???????? 1????????? 0 Producer????????????? 1971?????? 9955 P004
???????? 1????????? 0 Producer???????????? 29565???? 174989 P005
???????? 1????????? 0 Producer???????????? 20114???? 117537 P003
???????? 1????????? 1 Consumer?????????????? 893?????? 8107 P002
???????? 1????????? 1 Consumer????????????? 2914????? 26341 P001
???????? 1????????? 1 Consumer???????????????? 0???????? 60 P000
???????? 1????????? 1 Producer?????????????? 611?????? 5494 P003
???????? 1????????? 1 Producer????????????? 2593????? 23493 P005
???????? 1????????? 1 Producer?????????????? 603?????? 5521 P004
???????? 1????????? 2 Consumer????????????? 3807????? 19040 QC
???????? 1????????? 2 Producer????????????? 2914????? 14545 P001
???????? 1????????? 2 Producer?????????????? 893?????? 4475 P002
???????? 1????????? 2 Producer???????????????? 0???????? 20 P000
16 rows selected.
Obviously, from the v$pq_tqstat, the DOP is downgrade to 3, and have two slave sets.? From the v$tq_sesstat, we can see that what’s the process. How these processes cooperated together to generated the result. In this sample, P003, P004,P005 generate the data and feed to P000,P001,P002, totally twice. Then P000,P001,P002 feed the data to QC, the query coordinator. Also, we can find that P000 at the first time received 17257 rows, but at the second time it received 0 rows, and lastly, it product 0 rows to QC. Why?
Let’s look at the execution plan for further investigation.
SQL> @utlxplp
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1177066807
-----------------------------------------------------------------------------------------------------------------
| Id? | Operation?????????????? | Name???? | Rows? | Bytes | Cost (%CPU)| Time???? |??? TQ? |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|?? 0 | SELECT STATEMENT??????? |????????? |? 3867 |?? 147K|??? 13?? (8)| 00:00:01 |??????? |????? |??????????? |
|?? 1 |? PX COORDINATOR???????? |????????? |?????? |?????? |??????????? |????????? |??????? |????? |??????????? |
|?? 2 |?? PX SEND QC (RANDOM)?? | :TQ10002 |? 3867 |?? 147K|??? 13?? (8)| 00:00:01 |? Q1,02 | P->S | QC (RAND)? |
|*? 3 |??? HASH JOIN BUFFERED?? |????????? |? 3867 |?? 147K|??? 13?? (8)| 00:00:01 |? Q1,02 | PCWP |??????????? |
|?? 4 |???? PX RECEIVE????????? |????????? | 53332 |?? 677K|???? 7?? (0)| 00:00:01 |? Q1,02 | PCWP |??????????? |
|?? 5 |????? PX SEND HASH?????? | :TQ10000 | 53332 |?? 677K|???? 7?? (0)| 00:00:01 |? Q1,00 | P->P | HASH?????? |
|?? 6 |?????? PX BLOCK ITERATOR |????????? | 53332 |?? 677K|???? 7?? (0)| 00:00:01 |? Q1,00 | PCWC |??????????? |
|?? 7 |??????? TABLE ACCESS FULL| PRODUCTS | 53332 |?? 677K|???? 7?? (0)| 00:00:01 |? Q1,00 | PCWP |??????????? |
|?? 8 |???? PX RECEIVE????????? |????????? |? 3867 |??? 98K|???? 5?? (0)| 00:00:01 |? Q1,02 | PCWP |??????????? |
|?? 9 |????? PX SEND HASH?????? | :TQ10001 |? 3867 |??? 98K|???? 5?? (0)| 00:00:01 |? Q1,01 | P->P | HASH?????? |
|? 10 |?????? PX BLOCK ITERATOR |????????? |? 3867 |??? 98K|???? 5?? (0)| 00:00:01 |? Q1,01 | PCWC |??????????? |
|? 11 |??????? TABLE ACCESS FULL| COSTS??? |? 3867 |??? 98K|???? 5?? (0)| 00:00:01 |? Q1,01 | PCWP |??????????? |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
?? 3 - access("C"."PROD_ID"="P"."PROD_ID")
This is a hash-hash parallel distribute, using hash joining. firstly, Slave set(Q1,00) scan products table and send the results to slave set(Q1,02), and slave set(Q1,01) scan the costs table and send results to slave set(Q1,02), which slave in slave set(Q1,02) to send to is dependent on the hash value of join key. as the table costs only have two distinct values for prod_id, so the rows in costs only have two target slave to send to, resulting one slave of the slave set (Q1,02) receive 0 rows. After the slaves in (Q1,02) received? rows from the other two slave sets, they hash join the rows and send the result to QC, obviously, one slave will have no rows to process.
This is why 0 rows in the above output.
Also, you can enable the trace for parallel execution, through this way:
alter session set events '10132 trace name context forever';

Compute the variance of bytes grouped by TQ_ID. Large variances indicate workload imbalances. You should investigate large variances to determine whether the producers start out with unequal distributions of data, or whether the distribution itself is skewed. If the data itself is skewed, this might indicate a low cardinality, or low number of distinct values.

Some Common Sense

One of my favourite descriptions of performance tuning, althou gh I can’t remember where I first heard it, is that it is based on ‘informed common sense’. That really captures my ow n experiences of performance tu ning. Yes, you need to use proper analysis techniques and often a great deal of technical knowledge, but that’s all devalued if you’re completely missing the point . So let’s take a step away from the technical and consider the big picture.
Don’t even think about implementing Parallel Execution un less you are prepared to invest some time in initial testing, followed by ongoing performance monitoring. If you don’t, you might one day hit performance problems either server-wide or on an individual user session that you’d never believe (until it happens to you).
Parallel Execution is designed to utilise hardware as heavily as possible. If you are running on a single-CPU server with two hard disk drives and 512Mb RAM, don’t expect significant perfor mance improvements just because you switch PX on. The more CPUs , disk drives, controllers and RAM you have installed on your server, the better the results are going to be.
Although you may be able to use Parallel Execution to make an inefficient SQL stat ement run many times faster, that would be incredibly stupid. It’s essential that you tune the SQL first . In the end, doing more work than you should be, but more quickly, is still doing more work than you should be! To put it another way, don’t use PX as a dressing for a poorly designed application. Reduce the wo rkload to the minimum needed to achieve the task and then start using the server facilities to make it run as quickly as possible. Seems obvious, doesn’t it?
If you try to use PX to benefit a large number of user s performing online queries yo u may eventually bring the server to its knees. Well, maybe not if you use th e Adaptive Multi-User algorithm, but then it’s essential that both you and, more important, your users unders tand that response time is going to be very variable when the machine gets busy.
Using PX for a query that runs in a few seconds is pointless. You’re just going to use more resources on the server for very little improvemen t in the run time of the query. It might well run more slowly!
Sometimes when faced with a slow i/o subsystem you migh t find that higher degrees of parallelism are useful because the CPUs are spending more ti me waiting for i/o to complete. Ther efore they are more likely to be available for another PX slave (that isn’t waiting on i/o) to use. This was certainly my experience at one site. However, it’s also true that using PX will usually lead to a busier i/o subsystem because the server is likely to favour full scans over indexed retrieva l. There are no easy answers here - you really need to carry out some analysis of overall system resource usage to identify where the bottlenecks are an d adjust the configuration accordingly.
Consider whether PX is the correct parallel solution for overnight batch operations. It may be that you can achieve better performance using multip le streams of jobs, each single-threa ded, or maybe you would be better with one stream of jobs which uses PX. It depends on your application so the only sure way to find out is to try the different approaches .

BTW, introduce two useful link about parallel execution.

Oracle Database Parallel Execution Fundamental

http://www.oracle.com/technetwork/articles/datawarehouse/twp-parallel-execution-fundamentals-133639.pdf

How Parallel Execution Works

http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm#VLDBG0101

參考至:《讓Oracle跑得更快》譚懷遠著
?????????? 《讓Oracle跑得更快2》譚懷遠著?
?????????? http://howe.im/%E6%95%B0%E6%8D%AE%E5%BA%93/137.html
?????????? http://www.cnblogs.com/daduxiong/archive/2010/08/24/1807427.html
?????????? http://www.searchdatabase.com.cn/showcontent_44070.htm
?????????? http://xuguohao.gotoip55.com/?p=142
?????????? http://blog.csdn.net/tianlesoftware/article/details/5854583
?????????? http://docs.oracle.com/cd/E11882_01/server.112/e17110/initparams176.htm
?????????? http://space.itpub.net/17203031/viewspace-696389
???????????http://os2ora.com/monitor-parallel-execution/
???????????http://oracledoug.com/px.pdf
???????????http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel006.htm#VLDBG1513

???????????http://www.oracle.com/technetwork/articles/datawarehouse/twp-parallel-execution-fundamentals-133639.pdf
???????????http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel005.htm#CIHHGECI
???????????http://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams181.htm#REFRN10310
???????????http://www.cnblogs.com/luoyx/archive/2011/12/26/2302587.html
本文原創，轉載請注明出處、作者
如有錯誤，歡迎指正
郵箱:czmcj@163.com