參考文章:https://www.modb.pro/db/70021
概述
為了確保復雜的事務可以安全地同時運行,kingbase(PostgreSQL)提供了各種級別的鎖來控制對各種數據對象的并發訪問,使得對數據庫關鍵部分的更改序列化。事務并發運行,直到它們嘗試獲取互相沖突的鎖為止(比如兩個事務更新同一行時)。當多個事務同時在數據庫中運行時,并發控制是一種用于維持一致性和隔離性的技術,在kingbase(PostgreSQL)中,使用快照隔離Sanpshot Isolation (簡稱SI) 來實現多版本并發控制,同時以兩階段鎖定 (2PL) 機制為輔。在執行DDL時使用2PL,在執行DML時使用SI。
在數據庫中,同樣也存在著各式各樣的鎖,表級鎖、行級鎖、頁鎖等等,數據庫的并發能力除了和它的并發控制機制有關, 還和數據庫的鎖粒度控制息息相關,粒度越細沖突范圍就越小,并發能力就越強。鎖的最終目的無外乎是為了保證數據的一致性和完整性
kingbase鎖矩陣
場景
Z銀行客戶A系統 性能壓測,壓測時監控數據庫服務器,發現數據庫存在長連接情況,對長鏈接的會話進行鎖分析。
經典獲取鎖信息的SQL:
SELECT blocked_locks.pid AS blocked_pid,blocked_activity.usename AS blocked_user,blocked_activity.client_addr as blocked_client_addr,blocked_activity.client_hostname as blocked_client_hostname,blocked_activity.application_name as blocked_application_name,blocked_activity.wait_event_type as blocked_wait_event_type,blocked_activity.wait_event as blocked_wait_event,blocked_activity.query AS blocked_statement,blocked_activity.xact_start AS blocked_xact_start,blocking_locks.pid AS blocking_pid,blocking_activity.usename AS blocking_user,blocking_activity.client_addr as blocking_client_addr,blocking_activity.client_hostname as blocking_client_hostname,blocking_activity.application_name as blocking_application_name,blocking_activity.wait_event_type as blocking_wait_event_type,blocking_activity.wait_event as blocking_wait_event,blocking_activity.query AS current_statement_in_blocking_process,blocking_activity.xact_start AS blocking_xact_start
FROM pg_catalog.pg_locks blocked_locksJOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pidJOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktypeAND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASEAND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relationAND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.pageAND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tupleAND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxidAND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionidAND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classidAND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objidAND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubidAND blocking_locks.pid != blocked_locks.pidJOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted ORDER BY blocked_activity.pid ;
查詢結果:
通過上述的SQL查詢找到了相互阻塞的SQL,分析等待事件,pid 4183128 在等待一個事務提交,而pid 4183129 是在等客戶端完成操作。
兩個事務操作的不是同一張表,不應該相互阻塞才對,再次確認一下各自的阻塞pid。
查詢阻塞pid SQL:
SELECT pid,pg_blocking_pid(pid),wait_event_type,wait_event,query from pg_stat_activity where pid =xxxxx;
這里說明一下,pid改變,是使用兩次壓測的查詢結果。
查詢結果分析:
通過系統函數pg_blocking_pid可以看到阻塞事務pid和等鎖類型,分析阻塞update操作的確實是select操作的事務,鎖類型是lock,鎖事件是等待事務提交,而select操作的鎖類型是客戶端操作,等待事件是等待客戶端操作提交。
考慮到兩個SQL并不是操作的一個表,且select操作不應該阻塞DML操作,因此繼續使用第二個經典的SQL來查詢一下。
經典SQL2:
with
t_wait as
( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted
),
t_run as
( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted
),
t_overlap as
( select r.* from t_wait w join t_run r on ( r.locktype is not distinct from w.locktype and r.database is not distinct from w.database and r.relation is not distinct from w.relation and r.page is not distinct from w.page and r.tuple is not distinct from w.tuple and r.virtualxid is not distinct from w.virtualxid and r.transactionid is not distinct from w.transactionid and r.classid is not distinct from w.classid and r.objid is not distinct from w.objid and r.objsubid is not distinct from w.objsubid and r.pid <> w.pid )
),
t_unionall as
( select r.* from t_overlap r union all select w.* from t_wait w
)
select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid,
string_agg(
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)||
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
'SQL (Current SQL in Transaction): '||chr(10)||
case when query is null then 'NULL' else query::text end,
chr(10)||'--------'||chr(10)
order by ( case mode when 'INVALID' then 0 when 'AccessShareLock' then 1 when 'RowShareLock' then 2 when 'RowExclusiveLock' then 3 when 'ShareUpdateExclusiveLock' then 4 when 'ShareLock' then 5 when 'ShareRowExclusiveLock' then 6 when 'ExclusiveLock' then 7 when 'AccessExclusiveLock' then 8 else 0 end ) desc, (case when granted then 0 else 1 end)
) as lock_conflict
from t_unionall
group by
locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
SQL2查詢結果:
分析SQL2查詢結果,發現update實際被另一個update阻塞。
找到了相互阻塞的事務,現在重點關注的是select操作為什么沒有commit,這就需要看一下業務的代碼了,先現象和分析結果反饋給了業務側開發,開發反饋是方法上之前有一個事務的注解,去掉之后select正常提交,未在出現鎖等待情況。
問題解決方案:
1 數據庫端添加超時參數,idle_in_transaction_session_timeout,單位ms。
2 業務代碼側解決。