這是一個純腳本工具,用于從PostgreSQL的pg_stat_activity視圖中定期收集數據并保存到本地日志文件。
相關背景:
- 某個慢SQL打滿內存,導致系統kill掉postgres的某個進程,進而導致postgres進程重啟,沒有現場排查不了具體原因。(即使開啟了慢SQL日志,沒有執行完也不會記錄到數據庫日志中)
- 數據庫連接數被打滿,PG相關監控數據丟失(因為也連不上數據庫了),沒有現場,不知道異常請求來源。
特性:
- 定期收集PostgreSQL活動會話信息
- 支持通過定時任務配置收集頻率
- 提供日志文件自動分割功能
- 包含豐富的日志分析示例
安裝指南
拉取代碼,修改參數,設置定時任務。
# 克隆代碼
git clone git@github.com:yansheng836/pg_collect_pgsa.git
cd pg_collect_pgsa# 修改必要參數(均以 PG_ 開頭,例如:PG_PATH、PG_HOST 等)
vi pg_collect_pgsa.sh# 查路徑
pwd# crontab -e
# 每分鐘執行
* * * * * pwd路徑/pg_collect_pgsa.sh# 每5秒執行(可自行調整秒數)
* * * * * pwd路徑/pg_collect_pgsa_gap_second.sh 5
日志文件內容
測試版本:PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
輸出字段為:now(),datid, datname, pid, leader_pid, usesysid, usename, application_name, client_addr, client_hostname, client_port, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, backend_xid, backend_xmin, query_id, query, backend_type
2025-08-28 13:02:22.151458+08|||29360||||||||2025-08-12 13:58:41.03657+08||||Activity|CheckpointerMain||||||checkpointer
2025-08-28 13:02:22.151458+08|||29361||||||||2025-08-12 13:58:41.036868+08||||Activity|BgWriterHibernate||||||background writer
2025-08-28 13:02:22.151458+08|||29363||||||||2025-08-12 13:58:41.043339+08||||Activity|WalWriterMain||||||walwriter
2025-08-28 13:02:22.151458+08|||29365||10|postgres|||||2025-08-12 13:58:41.04334+08||||Activity|LogicalLauncherMain||||||logical replication launcher
2025-08-28 13:02:22.151458+08|||29364||||||||2025-08-12 13:58:41.043811+08||||Activity|AutoVacuumMain||||||autovacuum launcher
2025-08-28 13:02:22.151458+08|5|postgres|6583||10|postgres|Navicat|42.99.63.72||36481|2025-08-28 12:34:20.191304+08||2025-08-28 12:47:55.618303+08|2025-08-28 12:47:55.619804+08|Client|ClientRead|idle|||7982016161531118154|SELECT now(),datid, datname, pid, leader_pid, usesysid, usename, application_name, client_addr, client_hostname, client_port, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, backend_xid, backend_xmin, query_id, query, backend_type from pg_stat_activity WHERE pid <> pg_backend_pid() ORDER BY backend_start ASC|client backend
2025-08-28 13:02:22.151458+08|5|postgres|6611||10|postgres|Navicat|42.99.63.72||36773|2025-08-28 12:34:26.810414+08||2025-08-28 12:47:55.670278+08|2025-08-28 12:47:55.670683+08|Client|ClientRead|idle|||7746404270258954630|SELECT c.conkey FROM pg_constraint c WHERE c.contype = 'p' and c.conrelid = 12222|client backend
2025-08-28 13:02:23.339309+08|||29360||||||||2025-08-12 13:58:41.03657+08||||Activity|CheckpointerMain||||||checkpointer
2025-08-28 13:02:23.339309+08|||29361||||||||2025-08-12 13:58:41.036868+08||||Activity|BgWriterHibernate||||||background writer
2025-08-28 13:02:23.339309+08|||29363||||||||2025-08-12 13:58:41.043339+08||||Activity|WalWriterMain||||||walwriter
2025-08-28 13:02:23.339309+08|||29365||10|postgres|||||2025-08-12 13:58:41.04334+08||||Activity|LogicalLauncherMain||||||logical replication launcher
2025-08-28 13:02:23.339309+08|||29364||||||||2025-08-12 13:58:41.043811+08||||Activity|AutoVacuumMain||||||autovacuum launcher
2025-08-28 13:02:23.339309+08|5|postgres|6583||10|postgres|Navicat|42.99.63.72||36481|2025-08-28 12:34:20.191304+08||2025-08-28 12:47:55.618303+08|2025-08-28 12:47:55.619804+08|Client|ClientRead|idle|||7982016161531118154|SELECT now(),datid, datname, pid, leader_pid, usesysid, usename, application_name, client_addr, client_hostname, client_port, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, backend_xid, backend_xmin, query_id, query, backend_type from pg_stat_activity WHERE pid <> pg_backend_pid() ORDER BY backend_start ASC|client backend
2025-08-28 13:02:23.339309+08|5|postgres|6611||10|postgres|Navicat|42.99.63.72||36773|2025-08-28 12:34:26.810414+08||2025-08-28 12:47:55.670278+08|2025-08-28 12:47:55.670683+08|Client|ClientRead|idle|||7746404270258954630|SELECT c.conkey FROM pg_constraint c WHERE c.contype = 'p' and c.conrelid = 12222|client backend
日志分析參考
1.簡單檢索
# cat/more/less/grep
grep 'idle' pgsa.log# 查找具體時間的相關日志
grep '2025-09-04 12:59' pgsa.log# 在歸檔日志中,查找具體時間的相關日志
zless logs/pgsa-20250904-12.log.gz | grep '2025-09-04 12:59'
2.統計不同狀態的語句的數量
# 第18列是狀態:state
awk -F '|' '{print $18}' pgsa.log | sort | uniq -c10 4 idle
3.按照時間統計
# 按天統計
awk -F '|' '{print $1}' pgsa.log | cut -d ' ' -f1 | sort | uniq -c14 2025-08-28
# 按小時統計
awk -F '[| ]' '{print $1 " " $2}' pgsa.log | cut -d: -f1 | sort | uniq -c7 2025-08-28 127 2025-08-28 14
# 按分鐘統計
awk -F '[| ]' '{print $1 " " $2}' pgsa.log | cut -d: -f1-2 | sort | uniq -c7 2025-08-28 12:597 2025-08-28 14:09
注意事項
- 在業務繁忙的數據庫上使用時,需要注意日志文件可能會快速增長,建議在特殊情況下短暫使用,并密切關注磁盤空間。
query
字段的長度受PostgreSQL參數track_activity_query_size
限制,默認為1024,超出部分會被截斷。修改此參數需要重啟數據庫服務。- 賬號權限問題,可不使用postgres。推薦最小權限:[創建空庫,]創建普通用戶,授予
pg_read_all_stats
角色即可。-- CREATE DATABASE pgsadb; CREATE USER pgsa_user with password 'your password'; GRANT pg_read_all_stats TO pgsa_user;
倉庫
詳見:
GitHub:https://github.com/yansheng836/pg_collect_pgsa
Gitee:https://gitee.com/yansheng0083/pg_collect_pgsa