前言
昨天晚上深夜接到客戶電話,反應數據庫無法正常使用,想進入服務器檢查時,登錄響應非常慢。等兩分鐘后進入服務器且通過sqlplus進入數據庫也很慢。通過檢查服務器磁盤空間發現數據庫所在區已經爆滿,導致數據庫在運行期間新增審計文件無法成功導致數據庫宕機。客戶說按照之前的策略,正常每天做完備份后均會刪除七天前的歸檔日志。磁盤空間會保證一定程度的可用量。可最近直線上升。
問題
1、客戶反應磁盤空間爆滿不符合當前數據庫的整體策略?
2、為什么會出現磁盤爆滿的情況?
思考方向
1、磁盤空間爆滿是因為數據庫在運行期間,會不斷產生日志文件(包括alert、trace、listener、audit等),同時如果數據文件使用了自動擴展也會占用磁盤空間。如果不定期清理日志文件和對數據文件進行瘦身,只會導致數據庫占比空間越來越大。且客戶有提到定期刪除日志的現象,所以暫時排除該現象
2、磁盤空間爆滿不符合數據庫的整體策略。數據庫針對不同方向可定制不同策略。如備份策略、日志文件定期刪除策略。據客戶反應,他們擁有備份和日志文件定期刪除策略。也可以暫時排除該方向
檢查
1、檢查發現數據庫確實存在備份策略和日志文件定期刪除策略,且定時任務正常執行。檢查以上策略的日志文件發現。日志文件定期刪除策略中的日志記錄正常執行,但是備份策略中的歸檔日志無法刪除。
2、數據庫中的歸檔日志無法刪除。按照當前客戶配置來看,不存在RAC環境。那么最大的可能就是DG庫。那么問題就顯而易見了,當存在DG庫時,如果從庫無法接收主庫所產生的歸檔日志,那么主庫的歸檔日志將無法刪除,且會持續增長并占用主庫磁盤空間。經檢查從庫磁盤空間,確實是從庫磁盤空間爆滿導致無法接收主庫日志進行應用。所以主庫無法刪除歸檔日志。
查詢從庫是否存在GAP
1、從庫查詢GAP
SQL> select * from v$archive_gap;THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------1 332 334
2、查詢從庫已應用的日志
# 查詢應用完成的日志
SQL> SELECT max(sequence#) from v$archived_log where applied='YES';MAX(SEQUENCE#)
--------------332
經查發現,從庫由于磁盤空間爆滿無法應用日志導致GAP產生。開始修復
刪除從庫已經應用的日志
注意:此處刪除的日志應盡最大程度符合備份策略中對于歸檔日志刪除的保留期
# 刪除七天前的日志
find . -type f -mtime +7 -exec rm -rf {} \;
從庫重新注冊未應用的歸檔日志
RMAN>catalog start with '/data/archive_log_332.log';
RMAN>catalog start with '/data/archive_log_333.log';
RMAN>catalog start with '/data/archive_log_334.log';
從庫應用GAP日志
RMAN>restore archivelog sequence between 332 and 334;
從庫開啟日志同步
SQL>alter database recover managed standby database using current logfile disconnect from session;
校驗
經檢查發現依然存在GAP。所以此處使用備份進行修復
QL> select process,client_process,sequence#,status,BLOCK#,BLOCKS,GROUP#,delay_mins from v$managed_standby;PROCESS CLIENT_P SEQUENCE# STATUS BLOCK# BLOCKS
--------- -------- ---------- ------------ ---------- ----------
GROUP# DELAY_MINS
---------------------------------------- ----------
ARCH ARCH 0 CONNECTED 0 0
N/A 0ARCH ARCH 443 CLOSING 12288 1610
5 0ARCH ARCH 0 CONNECTED 0 0
N/A 0PROCESS CLIENT_P SEQUENCE# STATUS BLOCK# BLOCKS
--------- -------- ---------- ------------ ---------- ----------
GROUP# DELAY_MINS
---------------------------------------- ----------
ARCH ARCH 444 CLOSING 1 321
4 0RFS ARCH 0 IDLE 0 0
N/A 0RFS LGWR 445 IDLE 25 1
1 0PROCESS CLIENT_P SEQUENCE# STATUS BLOCK# BLOCKS
--------- -------- ---------- ------------ ---------- ----------
GROUP# DELAY_MINS
---------------------------------------- ----------
RFS UNKNOWN 0 IDLE 0 0
N/A 0MRP0 N/A 335 wait_for_gap 25 102400
N/A 0
修復方法
1、查詢從庫當前SCN
SQL> select to_char(current_scn) from v$database;TO_CHAR(CURRENT_SCN)
----------------------------------------
993872
2、從庫斷開日志同步
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
3、從庫關機并啟動到nomount狀態
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>startup nomount;
4、主庫執行備份
run{
allocate channel c1 device type disk;
allocate channel c2 device type disk;
backup as backupset incremental from scn 993872 database format '/data/db/backup/zengliang_%u.bak';
backup current controlfile for standby format '/data/db/backup/standby.bak';
release channel c1;
release channel c2;
}
5、從庫恢復控制文件
RMAN> restore standby controlfile from '/data/db/backup/standby.bak';Starting restore at 14-MAY-25
using channel ORA_DISK_1channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/data/db/oradata/master/control01.ctl
output file name=/data/db/oradata/master/control02.ctl
Finished restore at 14-MAY-25
6、從庫注冊備份
RMAN> catalog start with '/data/db/backup/';
searching for all files that match the pattern /data/db/backup/
List of Files Unknown to the Database
=====================================
File Name: /data/db/backup/standby.bak
File Name: /data/db/backup/zengliang_0e3peq3p.bak
File Name: /data/db/backup/zengliang_0f3peq3p.bak
File Name: /data/db/backup/zengliang_0g3peq4s.bak
7、從庫恢復數據庫
RMAN> recover database noredo;Starting recover at 14-MAY-25
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /data/db/oradata/master/system01.dbf
destination for restore of datafile 00003: /data/db/oradata/master/undotbs01.dbf
destination for restore of datafile 00005: /data/db/oradata/master/apps.dbf
channel ORA_DISK_1: reading from backup piece /data/db/backup/zengliang_0e3peq3p.bak
channel ORA_DISK_1: piece handle=/data/db/backup/zengliang_0e3peq3p.bak tag=TAG20250514T161657
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00002: /data/db/oradata/master/sysaux01.dbf
destination for restore of datafile 00004: /data/db/oradata/master/users01.dbf
destination for restore of datafile 00006: /data/db/oradata/master/test.dbf
channel ORA_DISK_1: reading from backup piece /data/db/backup/zengliang_0f3peq3p.bak
channel ORA_DISK_1: piece handle=/data/db/backup/zengliang_0f3peq3p.bak tag=TAG20250514T161657
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07Finished recover at 14-MAY-25
8、從庫清理redo日志組
SQL> alter database clear logfile group 1;
SQL> alter database clear logfile group 2;
SQL> alter database clear logfile group 3;
9、從庫打開數據庫
SQL> alter database open read only;
10、從庫開啟日志應用
SQL>alter database recover managed standby database using current logfile disconnect from session;
定期刪除腳本–OS級別
注意:使用該腳本并加入定時計劃,可實現定期自動化刪除歸檔日志,同理其它文件也可以使用該腳本
logpath="/data/scripts/new_scripts"
CURTIME=`date +%Y%m%d%H%M%S`
delF_log="$logpath/del_file_list_15.log"export week=`date +%w`
if [ ! -f $delF_log ]; then
touch $delF_log
chmod 777 $delF_log
else
rm -rf $delF_log
touch $delF_log
chmod 777 $delF_log
fiif [ $week -ne 0 ] ; thenecho "Today cannot Sunday:$week" >> $delF_logexit 0
fipad=$(printf '%0.1s' "-"{1..60})folList[0]="/back/prod/New_Coldback"
echo "Delete 15 days before file " >> $delF_log
echo "Perform Date :"$CURTIME >> $delF_log
echo "$pad" >>$delF_log
for i in ${folList[@]}
do
echo "Location is : $i " >>$delF_log
find $i -type d -mtime +15 >> $delF_log
find $i -type f -mtime +15 >> $delF_log
find $i -type f -mtime +15 -exec rm -rf {} \;
done
echo "$pad" >>$delF_log
echo "Complete Time :"$CURTIME >> $delF_log