死事務出現在異常關閉數據庫或者事務進程不正常結束,比如KILL -9,shutdown abort的情況下。
當前數據庫里的死事務可以通過查詢內部表x$ktuxe來獲得。
select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL='DEAD';
ADDR?????????????? KTUXEUSN?? KTUXESLT?? KTUXESQN?? KTUXESIZ
---------------- ---------- ---------- ---------- ----------
00002B92FF5D5F68???????? 15???????? 12???? 314961??????43611
KTUXESIZ代表需要回滾的回滾塊數。
死事務的回滾進程數可以通過參數fast_start_parallel_rollback來設置。
show parameter fast
NAME???????????????????????????????? TYPE?????????????????? VALUE
------------------------------------ ---------------------- ------------------------------
fast_start_io_target???????????????? integer??????????????? 0
fast_start_mttr_target?????????????? integer??????????????? 120
fast_start_parallel_rollback???????? string???????????????? LOW
low的設置,會使當產生死事務的時候啟用2*CPU數個并行回滾。
我們來模擬一個事務被kill掉的情況,在delete的過程中,把這個事務的進程kill掉:
delete from test;
ERROR:
ORA-03114: not connected to ORACLE
delete from test
*
ERROR at line 1:
ORA-12152: TNS:unable to send break message
然后觀察數據庫后臺的等待事件,發現啟動了很多的回滾進程。
SID SPID?????? EVENT??????????????????????????????? P1???????? P2???????? P3 SQL_ID???????????? SECON
------- ---------- ------------------------- ------------- ---------- ---------- ------------------ -----
1707 2323?????? SQL*Net message to client??? 1650815232????????? 1????????? 0 3t37hp1cnkuux????? 0
762 2312?????? read by other session???????????????? 3???? 298397???????? 36??????????????????? 0
1327 2286?????? read by other session???????????????? 3???? 298367???????? 36??????????????????? 0
382 2308?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
384 2274?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
573 2276?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
574 2310?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
762 2312?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
763 2278?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
950 2280?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
951 2314?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1139 2282?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1141 2316?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1516 2320?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1517 2284?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1518 2318?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1519 2288?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1705 2290?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
191 2306?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
1892 2258?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
2 2270?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
4 2304?????? wait for a undo record??????????????? 0????????? 0????????? 0??????????????????? 0
top - 21:47:42 up 16 days,? 3:30,? 2 users,? load average: 3.44, 1.94, 1.35
Tasks: 295 total,?? 1 running, 294 sleeping,?? 0 stopped,?? 0 zombie
Cpu(s):? 2.6%us,? 1.2%sy,? 0.0%ni, 79.6%id, 16.3%wa,? 0.0%hi,? 0.2%si,? 0.0%st
Mem:? 24679196k total, 20316832k used,? 4362364k free,?? 666864k buffers
Swap: 20482864k total,???? 3004k used, 20479860k free,? 6074052k cached
PID USER????? PR? NI? VIRT? RES? SHR S %CPU %MEM??? TIME+? COMMAND
23773 oracle??? 16?? 0 1680m 100m? 45m D 11.6? 0.4?? 0:05.95 ora_arc0_xdbsb
2258 oracle??? 16?? 0 1600m? 17m? 14m S? 6.0? 0.1?? 0:01.56 ora_p000_xdbsb
15652 root????? 10? -5???? 0??? 0??? 0 D? 4.6? 0.0?? 0:10.26 [kjournald]
23642 oracle??? 15?? 0 1605m? 31m? 25m S? 2.7? 0.1?? 0:03.51 ora_smon_xdbsb
2264 oracle??? 15?? 0 1600m? 17m? 14m S? 1.7? 0.1?? 0:00.32 ora_p003_xdbsb
2266 oracle??? 15?? 0 1600m? 17m? 14m S? 1.7? 0.1?? 0:00.31 ora_p004_xdbsb
2306 oracle??? 16?? 0 1600m? 17m? 14m S? 1.7? 0.1?? 0:00.33 ora_p024_xdbsb
2262 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p002_xdbsb
2268 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p005_xdbsb
2270 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p006_xdbsb
2274 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p008_xdbsb
2280 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p011_xdbsb
2282 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p012_xdbsb
2286 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p014_xdbsb
2292 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p017_xdbsb
2298 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.32 ora_p020_xdbsb
2310 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.32 ora_p026_xdbsb
2312 oracle??? 16?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.31 ora_p027_xdbsb
2314 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.32 ora_p028_xdbsb
2318 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.30 ora_p030_xdbsb
2320 oracle??? 15?? 0 1600m? 17m? 14m S? 1.3? 0.1?? 0:00.32 ora_p031_xdbsb
2260 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.28 ora_p001_xdbsb
2272 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.30 ora_p007_xdbsb
2276 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.31 ora_p009_xdbsb
2278 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.30 ora_p010_xdbsb
2284 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.30 ora_p013_xdbsb
2288 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.33 ora_p015_xdbsb
2290 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.29 ora_p016_xdbsb
2294 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.33 ora_p018_xdbsb
2296 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.31 ora_p019_xdbsb
2300 oracle??? 16?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.33 ora_p021_xdbsb
2302 oracle??? 16?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.32 ora_p022_xdbsb
2304 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.30 ora_p023_xdbsb
2308 oracle??? 16?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.31 ora_p025_xdbsb
2316 oracle??? 15?? 0 1600m? 17m? 14m S? 1.0? 0.1?? 0:00.31 ora_p029_xdbsb
23634 oracle??? 15?? 0 1609m? 29m? 17m S? 0.7? 0.1?? 0:06.28 ora_dbw0_xdbsb
23636 oracle??? 15?? 0 1607m? 26m? 15m S? 0.7? 0.1?? 0:05.99 ora_dbw1_xdbsb
2321 oracle??? 15?? 0 12872 1252? 816 R? 0.3? 0.0?? 0:00.07 top
由于我的服務器上的CPU數較多,ORACLE啟動了2*CPU數個回滾進程,這些ora_pxxx_xdbsb都是后臺啟動的并行回滾的進程。經常會發現回滾的進程間會產生資源的爭用,例如buffer busy waits等待事件。導致回滾的速度非常慢,我們可以通過調整參數??fast_start_parallel_rollback?為false,這樣回滾的進程就只會有一個,速度有時候反而比并行回滾還快。而且也會解決回滾進程數太多,導致回滾進程占用了太多的系統資源。大事務運行過程中被異常kill掉是一件比較嚴重的事情,死事務的回滾可能會占用掉你很多的系統資源。
回滾過程中,我們通過觀察x$ktuxe.ktuxesiz減少的速度來評估回滾的速度。可以根據以下算法來粗略的估算回滾需要的時間,這里是小時:
declare
l_start number;
l_end?? number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ---------這里根據實際數字來填寫
dbms_lock.sleep(60);? ---------可以縮小這個時間,但是太小,可能會導致誤差較大
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ---------這里根據實際數字來填寫
dbms_output.put_line('time cost Day:' ||
round(l_end / (l_start - l_end) / 60, 2));
end;
/