Oracle19C低版本一天遭遇兩BUG（ORA-04031/ORA-600）

昨天幫朋友看一個系統異常卡頓的案例，在這里分享給大家

環境：Exadata X8M? 數據庫版本19.11

1.系統報錯信息

表象為系統卡頓，頁面無法刷出，登陸到主機上看到節點1 系統等待存在大量的?cursor: pin S wait on X等待

查看兩個節點的alert log 看到有大量的ORA-04031報錯?

2025-04-15T14:43:53.522183+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_m000_342669.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01000: maximum open cursors exceeded
2025-04-15T14:44:50.515707+08:00
DDE: Problem Key 'ORA 4031' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
2025-04-15T14:46:29.968518+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_mz08_287162.trc:
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")
2025-04-15T14:46:30.005517+08:00
Process MZ08 died, see its trace filetrace file信息
Trace file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_mz08_287162.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.11.0.0.0
*** 2025-04-15T14:46:29.968405+08:00 (CDB$ROOT(1))
<error barrier> at 0x7ffceea153e8 placed ksv.c@7147
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")
OPIRIP: Uncaught error 447. Error stack:*** 2025-04-15T14:46:29.969405+08:00
ORA-00447: fatal error in background process
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")

MOS新加AI搜索能力

ORA-04031涉及到的bug非常多，遇到這類問題優先查MOS，這里簡單介紹一下MOS新推出的 AI搜索能力，可以根據你提供的報錯比較精準的給出搜索結果，并根據相關性將引用的文檔列出來，這個功能非常好

根據AI的提示簡單翻譯一下

請注意：服務提供的信息可能存在不準確之處，應進行獨立驗證。本解答適用于：Oracle Database - Enterprise Edition - 版本19.3.0.0.0至19.14.0.0.0

錯誤消息ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2798","kglseshtTable")表明數據庫無法從共享池分配所需的共享內存。

根據《數據庫錯誤消息》文檔，該錯誤的成因是所需共享內存超過了當前可用量。

解決方案如下：

增加初始化參數SGA_TARGET、MEMORY_SIZE或MEMORY_TARGET的值
若無法增加上述參數，則降低DB_CACHE_SIZE的值（如已設置）
若未使用SGA_TARGET、MEMORY_SIZE或MEMORY_TARGET，則增加對應內存池的大小：
- 共享池不足時增加SHARED_POOL_SIZE
- IMC堆不足時增加INMEMORY_SIZE

另一份相關文檔《ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1489","kglseshtTable")》提供了更具體的錯誤分析。

該錯誤由產品缺陷引起，已在未公開的Bug 33647820中記錄。解決方案為：

應用19.15或更高版本的補丁集（該問題已標記為修復）
或應用適用于您版本和平臺的補丁33647820（如存在）

檢查補丁沖突請使用My Oracle Support (MOS)的Patch Planner工具。若對應平臺和版本無可用補丁，請聯系Oracle技術支持申請修復。

2.原因

很明顯這個報錯是因為觸發了一個為 unpublished 的Bug ，根據文檔?ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1489","kglseshtTable") (Doc ID 2923103.1)

這個報錯和文檔中的描述完全一致，大概猜測是數據庫實例運行太久造成的share pool imbalance，該問題在19.15被解決，19.15之前版本BUG還是挺多的，如果有條件可以考慮升級到19.20+

話說Exadata穩定性還是非常強大的，這臺機器從入場至今1387天（接近四年了），沒有重啟過，出問題這個實例也沒有重啟過，如果不是這次遭遇BUG 應該還能跑很久。

3.解決方案

根據以上MOS的信息，有一下幾種處理方式

緊急處理方式，強制刷新share pool

alter system flush shared_pool;

或者重啟數據庫?很多內存相關的bug可以通過重啟數據庫來解決，畢竟打補丁現在來不及；我這里選擇的處理方案是輪流重啟兩個節點；

然而因為這個實例已經運行了太久 shutdown用了好長時間，并有大量pid 都需要手動kill，一下報出幾百個PID，還好現在AI比較強大直接將這部分log丟給deepseek，讓它把pid篩選出來就好了。

PDBPRO(6):Active process 279643 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):
PDBPRO(6):Active process 124660 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):
PDBPRO(6):Active process 246421 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):
PDBPRO(6):Active process 224818 user 'grid' program 'oracle@test.com.cn', waiting for 'read by other session'
PDBPRO(6):
PDBPRO(6):Active process 124650 user 'grid' program 'oracle@test.com.cn', waiting for 'SQL*Net message from client'
PDBPRO(6):

4.重啟再遇到BUG

節點2重啟正常，但是在節點1重啟時發現只能mount 無法OPEN 關鍵部分報錯如下

ALTER DATABASE OPEN /* db agent *//* {0:4:346} */2025-04-15T19:47:52.586367+08:00
CTWR started with pid=89, OS id=33744
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc  (incident=246935) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
Incident details in: /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246935/test11_ctwr_33744_i246935.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-04-15T19:47:53.513579+08:00
ORA-04031 heap dump being written to trace file /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246935/test11_ctwr_33744_i246935.trc
2025-04-15T19:47:54.106979+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc  (incident=246936) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [krcpasb_initial_alloc_failure], [3250176], [], [], [], [], [], [], [], [], [], []
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
Incident details in: /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246936/test11_ctwr_33744_i246936.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-04-15T19:47:54.598420+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc:
ORA-00600: internal error code, arguments: [krcpasb_initial_alloc_failure], [3250176], [], [], [], [], [], [], [], [], [], []
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
2025-04-15T19:47:54.598589+08:00
The change tracking error 600.
2025-04-15T19:47:54.598742+08:00
Stopping background process CTWR
2025-04-15T19:47:54.599120+08:00
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc:
ORA-00600: internal error code, arguments: [krcpasb_initial_alloc_failure], [3250176], [], [], [], [], [], [], [], [], [], []
ORA-04031: unable to allocate 52011112 bytes of shared memory ("large pool","unknown object","large pool","CTWR dba buffer")
2025-04-15T19:47:54.600494+08:00
Dumping diagnostic data in directory=[cdmp_20250415194754], requested by (instance=1, osid=33744 (CTWR)), summary=[incident=246936].
Errors in file /u01/app/oracle/diag/rdbms/test1/test11/trace/test11_ctwr_33744.trc  (incident=246937) (PDBNAME=CDB$ROOT):
ORA-487 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/oracle/diag/rdbms/test1/test11/incident/incdir_246937/test11_ctwr_33744_i246937.trc

錯誤發生在啟動?CTWR（Change Tracking Writer）進程時，最終導致?實例終止（instance crash）

?4.1 CTWR 進程啟動失敗

CTWR?是 Change Tracking Writer，用于實現?增量備份變更跟蹤（Block Change Tracking）?功能。它啟動時嘗試在?large pool?中分配大塊內存失敗，引發了 ORA-4031：

"large pool", "CTWR dba buffer"

4.2 ORA-00600 + ORA-4031 的組合說明這是一個嚴重的系統級錯誤?

ORA-00600?[krcpasb_initial_alloc_failure]?是?內部內存分配失敗
錯誤位置在 Oracle kernel 模塊?krcp*?系列，屬于 change tracking 內部模塊
后續的?進程中止、系統狀態轉儲、實例終止?都是級聯故障結果

4.3是否命中 Oracle 官方 Bug？

查MOS 很快找到和這個報錯和BUG??Bug 32428097?高度一致！

Bug 32428097 - ORA-600 [krcpasb_initial_alloc_failure] during CTWR startup

說明：

Oracle 19.x 在使用 change tracking 時，CTWR 在啟動期間分配內存失敗，觸發 ORA-04031 + ORA-00600 + 實例崩潰。
這是 Oracle 確認的回歸問題（Regression Bug）在19.13修復。
常見于：
- 大量數據變更（如恢復、測試環境還原）
- large pool 不足
- 某些版本升級后首次啟用 change tracking

4.4解決方案

暫時關閉block change track

ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;

5.總結

截止至2025年4月16日 Oracle19C已經更新至19.27，我認為至少在未來五年內，19c仍然會是主力版本；當然拉如果沒有遭遇BUG，理論上可以不打補丁的，但是為了系統的穩定，仍然建議將19C升級至19.20+ （保守點19.15+）

附錄oracle各版本支持時間線。

參考文檔：

ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1489","kglseshtTable") (Doc ID 2923103.1)

Bug 32428097 - BCT: CTWR crashes during "_bct_public_dba_buffer_size" reset with ORA-00600 [krcpasb_initial_alloc_failure] & ORA-4031 (Doc ID 32428097.8)

Release Schedule of Current Database Releases (Doc ID 742060.1)