OOM機制就是kill那些占用內存多且優先級低的進程以此來保證操作系統內核的正常運轉,一旦我們關閉OOM可能會導致操作系統內核奔潰。
https://manpages.ubuntu.com/manpages/jammy/en/man1/choom.1.html
Linux kernel uses the badness heuristic to select which process gets killed in out of memory conditions.
Linux 內核使用不良探索式來選擇在內存不足的情況下終止哪個進程。
涉及兩個重要參數
oom_score
可以簡單理解oom_score=內存消耗/總內存 *1000,也就是badneess分數,最高的會被kill掉
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500.
不良探索式為每個候選任務分配一個從 0(從不殺死)到 1000(總是殺死)的值,以確定哪個進程是目標。這些單位大致是進程可以根據其當前內存和交換使用的估計進行分配的允許內存范圍的比例。例如,如果某個任務使用了所有允許的內存,則其不良分數將為 1000。如果它使用了允許的內存的一半,則其分數將為 500。
oom_score_adj
The adjust score value is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 to +1000. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, -1000, is equivalent to disabling oom killing entirely for that task since it will always report a badness score of 0.
Setting an adjust score value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task’s allowed memory from being considered as scoring against the task.
調整分數值會先添加到不良分數中,然后再用于確定要終止哪個任務。可接受的值范圍為 -1000 到 +1000(建議值越小,進程被殺的機會越低。如果將其設置為 -1000 時,進程將被禁止殺掉。)。這允許用戶空間通過始終優先選擇某個任務或完全禁用它來極化 oom 終止的偏好。最低可能值 -1000,相當于完全禁用該任務的 oomkilling,因為它總是報告 0 的壞度分數。
例如,將調整分值設置為 +500 大致相當于允許共享相同系統、cpuset、mempolicy 或內存控制器資源的其余任務使用至少 50% 以上的內存。另一方面,值 -500 大致相當于將任務允許內存的 50% 打折扣,不將其視為針對任務的評分。
OOM的存在是為了保證操作系統內核的正常運行
https://www.oracle.com/technical-resources/articles/it-infrastructure/dev-oom-killer.html
The Linux kernel allocates memory upon the demand of the applications running on the system. Because many applications allocate their memory up front and often don’t utilize the memory allocated, the kernel was designed with the ability to over-commit memory to make memory usage more efficient. This over-commit model allows the kernel to allocate more memory than it actually has physically available. If a process actually utilizes the memory it was allocated, the kernel then provides these resources to the application. When too many applications start utilizing the memory they were allocated, the over-commit model sometimes becomes problematic and the kernel must start killing processes in order to stay operational. The mechanism the kernel uses to recover memory on the system is referred to as the out-of-memory killer or OOM killer for short.
Linux 內核根據系統上運行的應用程序的需求分配內存。由于許多應用程序預先分配內存并且通常不利用分配的內存,因此內核設計為能夠過度使用內存以使內存使用更有效。這種過度使用模型允許內核分配比實際可用的內存更多的內存。如果進程實際使用了為其分配的內存,則內核會將這些資源提供給應用程序。當太多應用程序開始使用為其分配的內存時,過度提交模型有時會出現問題,并且內核必須開始終止進程??才能保持運行。內核用于恢復系統內存的機制稱為內存不足殺手或簡稱 OOM 殺手。
查看服務器是否禁用了OOM機制,執行sysctl -a |grep panic_on_oom,如果vm.panic_on_oom=0就表示開啟,如果想禁用的話執行vim /etc/sysctl.conf,修改vm.panic_on_oom = 1(1表示關閉,默認為0表示開啟OOM),再執行sysctl -p
Postgresql在ubuntu上遭遇OOM的一個例子
服務器本身的內存和swap信息
root@PGD001:~# free -mtotal used free shared buff/cache available
Mem: 32058 19815 2984 4956 9258 6822
Swap: 4095 12 4083
dmesg命令看到信息如下
root@PGD001:~# dmesg -T |grep postgres
[Wed Nov 15 20:31:32 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/mountdatadomaindir.service,task=bash,pid=2392236,uid=0
[Wed Nov 15 20:31:33 2023] Out of memory: Killed process 2627 (postgres) total-vm:37766764kB, anon-rss:24965976kB, file-rss:2476kB, shmem-rss:2224896kB, UID:115 pgtables:63852kB oom_score_adj:-900
[Wed Nov 15 20:31:36 2023] oom_reaper: reaped process 2627 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:2224896kB
備注:anon-rss表示anonymous resident set size匿名駐留集
egrep看到OS錯誤日志信息如下,發現很多服務都被oom-kill掉了
root@PGD001:~# egrep -i -r 'killed process' /var/log/syslog
Nov 15 20:31:34 PGD001 kernel: [1097200.699832] Out of memory: Killed process 2392264 (centrifydc) total-vm:4228kB, anon-rss:156kB, file-rss:1016kB, shmem-rss:0kB, UID:0 pgtables:48kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097200.788886] Out of memory: Killed process 2392236 (bash) total-vm:7368kB, anon-rss:240kB, file-rss:744kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097200.947275] Out of memory: Killed process 872 (systemd-timesyn) total-vm:89356kB, anon-rss:128kB, file-rss:0kB, shmem-rss:0kB, UID:104 pgtables:72kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097201.246575] Out of memory: Killed process 2392239 (boostfs) total-vm:9584kB, anon-rss:256kB, file-rss:216kB, shmem-rss:0kB, UID:0 pgtables:52kB oom_score_adj:0
Nov 15 20:31:34 PGD001 kernel: [1097201.248859] Out of memory: Killed process 2627 (postgres) total-vm:37766764kB, anon-rss:24965976kB, file-rss:2476kB, shmem-rss:2224896kB, UID:115 pgtables:63852kB oom_score_adj:-900OS錯誤日志記錄postgresql的信息如下
root@PGD001:~# vim /var/log/syslog
Nov 15 20:31:34 PGD001 kernel: [1097200.788558] Tasks state (memory values in pages):
Nov 15 20:31:34 PGD001 kernel: [1097200.788559] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Nov 15 20:31:34 PGD001 kernel: [1097200.788606] [ 1033] 115 1033 2169621 34143 536576 453 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788613] [ 1096] 115 1096 18294 356 118784 471 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788616] [ 1097] 115 1097 2169737 1004636 10375168 477 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788619] [ 1098] 115 1098 2169666 21642 315392 484 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788621] [ 1106] 115 1106 2169621 4454 172032 482 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788624] [ 1107] 115 1107 2170049 704 188416 495 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788626] [ 1108] 115 1108 2169647 369 139264 467 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788630] [ 1109] 115 1109 2170018 552 159744 494 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788633] [ 2627] 115 2627 9441691 6798507 65384448 989683 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788636] [2377149] 115 2377149 2171112 6198 315392 468 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788639] [2377150] 115 2377150 2171136 6728 315392 410 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788643] [2382504] 115 2382504 2172165 7311 323584 394 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788649] [2390327] 115 2390327 2172158 7270 323584 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788655] [2392025] 115 2392025 2240168 194775 3178496 371 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788659] [2392066] 115 2392066 2173236 43043 1736704 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788663] [2392076] 115 2392076 2193878 120763 2428928 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788665] [2392080] 115 2392080 2243484 197961 3190784 371 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788668] [2392093] 115 2392093 2226648 154194 2289664 371 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788671] [2392095] 115 2392095 2224712 63962 1990656 373 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788673] [2392110] 115 2392110 2241676 167738 2383872 373 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788676] [2392114] 115 2392114 2173331 41251 1609728 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788678] [2392115] 115 2392115 2170462 6530 417792 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788680] [2392116] 115 2392116 2174010 40685 1613824 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788682] [2392117] 115 2392117 2172912 35250 1581056 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788684] [2392124] 115 2392124 2172228 34549 1527808 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788687] [2392127] 115 2392127 2171267 9574 462848 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788689] [2392128] 115 2392128 2202943 240863 3563520 372 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788691] [2392140] 115 2392140 2193312 118464 2379776 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788695] [2392141] 115 2392141 2193830 121733 2387968 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788698] [2392142] 115 2392142 2193614 118047 2404352 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788701] [2392143] 115 2392143 2193945 118742 2387968 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788703] [2392144] 115 2392144 2173072 25293 753664 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788706] [2392145] 115 2392145 2173311 25541 753664 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788708] [2392150] 115 2392150 2170682 9356 434176 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788711] [2392160] 115 2392160 2171583 13385 655360 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788713] [2392162] 115 2392162 2171217 27180 1327104 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788715] [2392163] 115 2392163 2170622 9180 462848 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788717] [2392171] 115 2392171 2171229 25619 1241088 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788720] [2392173] 115 2392173 2170552 6687 376832 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788722] [2392174] 115 2392174 2170484 6309 385024 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788725] [2392176] 115 2392176 2171478 11652 610304 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788727] [2392177] 115 2392177 2170490 7124 425984 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788731] [2392178] 115 2392178 2170710 11417 561152 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788734] [2392179] 115 2392179 2170753 10507 548864 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788737] [2392180] 115 2392180 2170563 8826 471040 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788740] [2392181] 115 2392181 2170563 8319 471040 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788743] [2392182] 115 2392182 2171492 12312 598016 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788747] [2392184] 115 2392184 2171367 17877 815104 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788750] [2392185] 115 2392185 2171227 9037 430080 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788753] [2392195] 115 2392195 2170402 6457 413696 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788757] [2392197] 115 2392197 2171241 16841 843776 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788761] [2392200] 115 2392200 2174518 71413 1560576 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788764] [2392201] 115 2392201 2171813 16907 835584 376 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788766] [2392202] 115 2392202 2170446 6224 323584 379 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788769] [2392215] 115 2392215 2171139 13606 651264 377 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788773] [2392218] 115 2392218 2197655 6652 512000 405 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788777] [2392219] 115 2392219 2197655 6768 512000 405 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788780] [2392233] 115 2392233 2170461 6442 327680 378 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788786] [2392237] 115 2392237 2170063 977 196608 382 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788793] [2392240] 115 2392240 2170063 913 196608 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788797] [2392241] 115 2392241 2170063 690 155648 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788801] [2392242] 115 2392242 2170063 958 172032 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788804] [2392243] 115 2392243 2170063 1174 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788807] [2392244] 115 2392244 2170063 1117 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788810] [2392245] 115 2392245 2170063 716 155648 390 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788813] [2392246] 115 2392246 2170063 958 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788816] [2392247] 115 2392247 2170063 909 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788819] [2392249] 115 2392249 2169653 588 118784 406 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788822] [2392250] 115 2392250 2169621 441 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788825] [2392251] 115 2392251 2170063 968 196608 388 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788828] [2392252] 115 2392252 2169653 403 118784 406 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788831] [2392253] 115 2392253 2169621 536 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788834] [2392254] 115 2392254 2169621 299 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788837] [2392255] 115 2392255 2170063 1316 217088 382 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788841] [2392256] 115 2392256 2170053 974 200704 420 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788845] [2392258] 115 2392258 2169621 367 118784 432 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788847] [2392259] 115 2392259 2169653 557 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788850] [2392260] 115 2392260 2169621 349 118784 433 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788853] [2392261] 115 2392261 2169621 535 118784 435 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788855] [2392262] 115 2392262 2169621 360 118784 433 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788858] [2392263] 115 2392263 2169621 299 118784 407 -900 postgres
Nov 15 20:31:34 PGD001 kernel: [1097200.788860] [2392265] 115 2392265 2169621 364 118784 447 -900 postgres
Postgresql 被oom后,重新啟動postgresql后
查看postgresql服務的oom_score和oom_score_adj值
root@PGD001:~# ps -ef|grep postgres |grep PGDATA
postgres 2393324 1 0 02:01 ? 00:00:27 /usr/lib/postgresql/15/bin/postgres -D /PGDATA
root@PGD001:~# cat /proc/2393324/oom_score_adj
-900
root@PGD001:~# cat /proc/2393324/oom_score
70
查看操作系統級別的參數
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmmax
kernel.shmmax = 18446744073692774399
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmall
kernel.shmall = 18446744073692774399
root@DAILACHDBUD001:~# sysctl -a |grep kernel.shmmni
kernel.shmmni = 4096root@PGD001:~# cat /etc/security/limits.conf |grep -v "#"
* soft nofile 1024
* hard nofile 2048
* soft nproc 1024
* hard nproc 2048
查看postgresql數據庫級別的參數
postgres=# show shared_buffers;shared_buffers
----------------8GBpostgres=# show max_connections;max_connections
-----------------200postgres=# show work_mem;work_mem
----------4MBpostgres=# show temp_buffers;temp_buffers
--------------8MBpostgres=# show maintenance_work_mem;maintenance_work_mem
----------------------64MBpostgres=# show autovacuum_work_mem;autovacuum_work_mem
----------------------1postgres=# show autovacuum_max_workers;autovacuum_max_workers
------------------------3
備注:
shared_buffers:設置數據庫服務器將使用的共享內存緩沖區量。如果有一個專用的 1GB 或更多內存的數據庫服務器,一個合理的shared_buffers開始值是系統內存的25%
work_mem:指定在寫到臨時磁盤文件之前被內部排序操作和哈希表使用的內存量。該值默認為四兆字節(4MB)
temp_buffers:設置每個數據庫會話使用的臨時緩沖區的最大數目。這些都是會話的本地緩沖區,只用于訪問臨時表。默認是8MB
autovacuum_work_mem指定每個自動清理工作者進程能使用的最大內存量。其默認值為-1表示轉而使用maintenance_work_mem的值,當自動清理運行時,可能會分配最多達這個內存的autovacuum_max_workers倍
查看當前程序占用內存的信息
root@PGD001:~# smem -t -r -a | head -20PID User Command Swap USS PSS RSS
2394448 postgres postgres: 15/main: veeamuser VeeamBackupReporting 172.22.137.89(50228) idle 380 20701320 21192492 21812648
2393326 postgres postgres: 15/main: checkpointer 44 842436 1842987 3047168
3846886 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58846) idle 380 256848 567859 1011720
3846852 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58654) SELECT 380 130864 308700 619768
2392350 root /opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true 3888 292596 292670 294968
3846901 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58926) idle 380 87968 149992 386656
3846903 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58928) idle 380 76084 137657 373052
3846877 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58815) idle 380 26056 126282 387476
2393324 postgres /usr/lib/postgresql/15/bin/postgres -D /PGDATA 44 52700 77518 200092
3846889 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58860) idle 380 25172 57457 205336633 root /sbin/multipathd -d -s 0 22712 23409 27924
3846902 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58927) idle 380 11204 23353 132416961 root /usr/lib/snapd/snapd 2132 18836 18876 20848
3846890 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58861) idle 380 8552 18772 117992
3846899 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58914) idle 380 7340 18020 119676
2393327 postgres postgres: 15/main: background writer 80 336 17668 91248
3846898 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58913) idle 380 6092 16874 115916
3846904 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58930) idle 380 7024 15296 104320
3846908 postgres postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(58957) idle 380 11440 14047 66592
root@PGD001:~# smem -u -p -a
User Count Swap USS PSS RSS
systemd-timesync 1 0.00% 0.00% 0.00% 0.02%
messagebus 1 0.01% 0.00% 0.00% 0.02%
systemd-network 1 0.00% 0.01% 0.01% 0.02%
syslog 1 0.00% 0.01% 0.01% 0.02%
systemd-resolve 1 0.00% 0.02% 0.02% 0.03%
root 21 0.22% 1.19% 1.25% 1.56%
postgres 43 0.34% 68.76% 75.26% 90.34%
root@PGD001:~# top
...PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2394448 postgres 20 0 27.0g 21.0g 2.4g S 0.0 67.1 4100:15 postgres
2393326 postgres 20 0 8678924 2.9g 2.9g S 0.0 9.3 10:32.81 postgres
3855468 postgres 20 0 8807032 846728 818960 S 0.0 2.6 0:00.73 postgres
3855453 postgres 20 0 8877256 587404 492304 S 0.0 1.8 0:07.70 postgres
3855466 postgres 20 0 8778480 383144 287312 S 20.5 1.2 0:02.36 postgres
3855465 postgres 20 0 8775984 380292 288272 S 2.6 1.2 0:01.85 postgres
3855464 postgres 20 0 8774756 376020 287796 S 3.0 1.1 0:01.97 postgres
3855481 postgres 20 0 8703792 371900 345996 S 22.8 1.1 0:01.11 postgres
2392350 root 20 0 1182364 292852 2700 S 0.0 0.9 54:37.50 boostfs
2393324 postgres 20 0 8678484 200092 197304 S 0.0 0.6 24:03.58 postgres
2393327 postgres 20 0 8678648 90992 88152 S 0.0 0.3 0:27.56 postgres
3855473 postgres 20 0 8687624 76032 69008 S 0.7 0.2 0:00.07 postgres
3855483 postgres 20 0 8694496 66440 52784 S 0.0 0.2 0:00.04 postgres
3855463 postgres 20 0 8682872 59528 54192 S 0.3 0.2 0:00.06 postgres
3855482 postgres 20 0 8681836 42672 38304 S 0.0 0.1 0:00.03 postgres
3855471 postgres 20 0 8681720 39628 35356 S 0.0 0.1 0:00.02 postgres
3799836 postgres 20 0 8684528 39412 32052 S 0.0 0.1 0:18.87 postgres
3854094 postgres 20 0 8684520 39336 32008 S 0.0 0.1 0:00.86 postgres
3746080 postgres 20 0 8684644 39300 31820 S 0.0 0.1 0:35.88 postgres
3855475 postgres 20 0 8681652 38492 34096 S 0.0 0.1 0:00.02 postgres
備注:
VIRT:進程占用的虛擬內存空間大小,包含了在已經映射到物理內存空間的部分和尚未映射到物理內存空間的部分總和。VIRT是virtual memory usage虛擬內存的縮寫,虛擬內存是一個假象的內存空間,在程序運行過程中虛擬內存空間中需要被訪問的部分會被映射到物理內存空間中。虛擬內存空間大只能表示程序運行過程中可訪問的空間比較大,不代表物理內存空間占用也大,VIRT = SWAP + RES
RES:進程占用的虛擬內存空間中已經映射到物理內存空間的那部分的大小。看進程在運行過程中占用了多少內存應該看RES的值而不是VIRT的值。RES是resident memory usage常駐內存的縮寫,常駐內存就是進程實實在在占用的物理內存。一般我們所講的進程占用了多少內存,其實就是說的占用了多少常駐內存而不是多少虛擬內存。
SHR:SHR是share(共享)的縮寫,表示進程占用的共享內存大小,共享內存就是被多個進程所共享的內存,比如動態庫libc.so占用的內存就是共享內存,因為這個共享內存可能被很多不同會話使用,但是這些會話都會去調用libc.so
VSS:Virtual Set Size是進程向系統申請的虛擬內存,和VIRT一樣
RSS:Resident Set Size是進程在 RAM 中實際保存的總內存,和RES一樣
PSS:Proportional Set Size是單個進程運行時實際占用的物理內存
USS:Unique Set Size是進程獨自占用的物理內存
查看會話信息
postgres=# show idle_session_timeout ;idle_session_timeout
----------------------0postgres=# select count(*) from pg_stat_activity where state='idle';count
-------45postgres=# select pid,usename,datname,client_addr,state from pg_stat_activity;pid | usename | datname | client_addr | state
---------+-----------+----------------------+---------------+--------2393349 | | | |2393351 | postgres | | |3931348 | postgres | postgres | 172.22.138.94 | idle3949134 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949093 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949090 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949094 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949135 | veeamuser | VeeamBackup | 172.22.137.89 | idle2394448 | veeamuser | VeeamBackupReporting | 172.22.137.89 | idle3854094 | postgres | postgres | 172.22.138.94 | idle3949083 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949102 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949103 | veeamuser | VeeamBackup | 172.22.137.89 | active3949127 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949084 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949132 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949133 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949095 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949119 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949085 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949125 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949086 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949117 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949104 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949105 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949096 | veeamuser | VeeamBackupReporting | 172.22.137.89 | idle3949098 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949136 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949099 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949100 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949101 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949106 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949107 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949137 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949047 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949138 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949108 | veeamuser | VeeamBackup | 172.22.137.89 | idle3949109 | veeamuser | VeeamBackup | 172.22.137.89 | idle
...root@PGD001:~# ps -ef|grep postgres
root 2392350 1 0 Nov15 ? 00:59:03 /opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true
postgres 2393324 1 0 Nov15 ? 00:25:46 /usr/lib/postgresql/15/bin/postgres -D /PGDATA
postgres 2393325 2393324 0 Nov15 ? 00:00:00 postgres: 15/main: logger
postgres 2393326 2393324 0 Nov15 ? 00:11:52 postgres: 15/main: checkpointer
postgres 2393327 2393324 0 Nov15 ? 00:00:30 postgres: 15/main: background writer
postgres 2393348 2393324 0 Nov15 ? 00:04:30 postgres: 15/main: walwriter
postgres 2393349 2393324 0 Nov15 ? 00:00:54 postgres: 15/main: autovacuum launcher
postgres 2393350 2393324 0 Nov15 ? 00:00:05 postgres: 15/main: archiver last was 0000000100000137000000E1
postgres 2393351 2393324 0 Nov15 ? 00:00:51 postgres: 15/main: logical replication launcher
postgres 2394448 2393324 38 Nov15 ? 3-02:20:42 postgres: 15/main: veeamuser VeeamBackupReporting 172.22.137.89(50228) SELECT
postgres 3854094 2393324 0 Nov22 ? 00:00:30 postgres: 15/main: postgres postgres 172.22.138.94(63531) idle
postgres 3906865 2393324 0 09:40 ? 00:00:12 postgres: 15/main: postgres postgres 172.22.138.94(52002) idle
postgres 3931348 2393324 0 15:17 ? 00:00:04 postgres: 15/main: postgres postgres 172.22.138.94(54154) idle
postgres 3949113 2393324 24 19:06 ? 00:00:15 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51688) idle
postgres 3949122 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51913) idle
postgres 3949123 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(51970) idle
postgres 3949179 2393324 27 19:06 ? 00:00:08 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52262) SELECT
postgres 3949182 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52282) idle
postgres 3949184 2393324 1 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52300) idle
postgres 3949185 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52307) idle
postgres 3949186 2393324 2 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52309) idle
postgres 3949187 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52313) idle
postgres 3949190 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52372) idle
postgres 3949191 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52374) idle
postgres 3949192 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52375) idle
postgres 3949194 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52376) idle
postgres 3949196 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52449) idle
postgres 3949197 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52473) idle
postgres 3949198 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52475) idle
postgres 3949199 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52476) idle
postgres 3949201 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52508) idle
postgres 3949202 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52515) idle
postgres 3949205 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52539) idle
postgres 3949206 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52541) idle
postgres 3949207 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52543) idle
postgres 3949208 2393324 27 19:06 ? 00:00:03 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52542) idle
postgres 3949209 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52567) idle
postgres 3949210 2393324 0 19:06 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52574) idle
postgres 3949212 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52599) idle
postgres 3949218 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52600) idle
postgres 3949219 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52638) idle
postgres 3949220 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52651) idle
postgres 3949222 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52666) idle
postgres 3949224 2393324 0 19:07 ? 00:00:00 postgres: 15/main: veeamuser VeeamBackup 172.22.137.89(52670) idle
...
分析:物理32GB的情況下,OOM時捕獲的postgresql最大所需內存居然達total-vm:37766764kB,檢查發現postgresql數據庫級別的內存參數設置都是合理的,并且postgresql的被OOM級別很低值為-900(-1000的話就不會被內核OOM)。postgresql活動的時候查詢到postgresql數據庫服務會占用操作系統70%-90%的內存,而且OOM時發現發現不僅僅是postgres數據庫服務器其他很多服務也都被oom-kill掉了,那么應該是操作系統級別參數kernel.shmmax和kernel.shmall值可能不太合適,,而且太多會話idle的情況下,內存還是很大,可能idle會話超時時間idle_session_timeout的設置也不太合理,swap值為4GB也不太合適
為避免再次被oom掉,采取如下措施
1、設置kernel.shmmax值17179869184為物理內存的一半,設置kernel.shmall值為4194304=shmmax/page_size
root@PGD001:~# vim /etc/sysctl.conf
kernel.shmmax=17179869184
kernel.shmall=4194304
root@PGD001:~# sysctl -p
root@PGD001:~# sysctl -a |grep kernel.shmmax
kernel.shmmax = 17179869184
root@PGD001:~# sysctl -a |grep kernel.shmall
kernel.shmall = 4194304
root@PGD001:~# sysctl -a |grep kernel.shmmni
kernel.shmmni = 4096
root@PGD001:~# ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 16777216
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1
2、設置idle_session_timeout=8h
postgres=# alter system set idle_session_timeout='8h';
ALTER SYSTEM
postgres=# select pg_reload_conf();pg_reload_conf
----------------t
postgres=# show idle_session_timeout;idle_session_timeout
----------------------8h
3、設置swap為物理內存的1倍即32GB
root@PGD001:~# free -mtotal used free shared buff/cache available
Mem: 32058 20278 2195 4967 9583 6349
Swap: 4095 12 4083root@PGD001:~# swapon -s
Filename Type Size Used Priority
/swap.img file 4194300 13120 -2root@PGD001:/# cat /etc/fstab |grep swap
/swap.img none swap sw 0 0root@PGD001:/# ll /swap.img
-rw------- 1 root root 4294967296 Sep 6 2022 /swap.imgroot@PGD001:/# fallocate -l 4G /swap1.imgroot@PGD001:/# chmod 600 /swap1.imgroot@PGD001:/# ll /swap1.img
-rw------- 1 root root 4294967296 Nov 22 22:51 /swap1.imgroot@PGD001:/# mkswap /swap1.img
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=85b78962-8bae-48d8-a5c0-e30903b7b8d6root@PGD001:/# swapon /swap1.imgroot@PGD001:/# free -mtotal used free shared buff/cache available
Mem: 32058 20102 2325 4967 9630 6525
Swap: 8191 12 8179
root@PGD001:/# swapon -s
Filename Type Size Used Priority
/swap.img file 4194300 13120 -2
/swap1.img file 4194300 0 -3root@PGD001:/# swapoff -v /swap.img
swapoff /swap.imgroot@PGD001:/# swapon -s
Filename Type Size Used Priority
/swap1.img file 4194300 0 -2root@PGD001:/# free -mtotal used free shared buff/cache available
Mem: 32058 20149 2276 4969 9632 6476
Swap: 4095 0 4095root@PGD001:/# fallocate -l 32G /swap.imgroot@PGD001:/# chmod 600 /swap.imgroot@PGD001:/# ll /swap.img
-rw------- 1 root root 34359738368 Nov 22 22:53 /swap.imgroot@PGD001:/# mkswap /swap.img
mkswap: /swap.img: warning: wiping old swap signature.
Setting up swapspace version 1, size = 32 GiB (34359734272 bytes)
no label, UUID=9d658937-a89d-472b-aa94-be23e7f8703croot@PGD001:/# swapon /swap.imgroot@PGD001:/# free -mtotal used free shared buff/cache available
Mem: 32058 20272 2149 4969 9637 6354
Swap: 36863 0 36863root@PGD001:/# swapoff -v /swap1.img
swapoff /swap1.imgroot@PGD001:/# swapon -s
Filename Type Size Used Priority
/swap.img file 33554428 0 -2root@PGD001:/# free -mtotal used free shared buff/cache available
Mem: 32058 20342 2078 4969 9637 6283
Swap: 32767 0 32767root@PGD001:/# cat /etc/fstab |grep swap
/swap.img none swap sw 0 0