一次 Linux 高負載(Load)異常排查實錄
- 一、背景及排查過程材料
- 二、排查分析
- 2.1Load 的真正含義
- 2.2:確認異常進程
- 2.3:線程卡在哪(wchan)
- 2.4:perf 采樣(用戶態/內核態熱點)
- 2.5:熱點文件路徑(誰在被頻繁訪問)
- 2.6:打開文件數(以 /proc 為準)
- 2.7:火焰圖分析(生成 + 示意圖)
- 2.8:結論(本次事件)
- 2.9:優化建議(可落地)
- A. 線程狀態與等待點巡檢(watch)
- B. 確認 lookup_slow 與 fstatat 熱點路徑
- C. 進程級路徑熱點統計(openat/fstatat)
- D. 聚焦前 N 個 lookup_slow 線程(內核棧 + 快速 syscalls)
- 三、本次排查總結
一、背景及排查過程材料
- 生產環境一臺 64 核 Linux 服務器,
load5
突然飆升至 500+。 - 通過
top
觀察,CPU idle 仍約 90%,CPU 未被打滿。 - 自動化告警展示異常節點的 top5 進程,定位到
backend
組件。
疑問:CPU 很閑,為何 Load 還在飆升?
[root@kyserver-xxxx ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Stepping: 7
CPU MHz: 2300.000
BogoMIPS: 4600.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
[root@kyserver-xxxxx ~]# free -gtotal used free shared buff/cache available
Mem: 251 41 53 3 156 75
Swap: 0 0 0
# 卡在 lookup_slow → link_path_walk → path_lookupat → vfs_fstatat
[root@kyserver-21-024 mmwei3]# ./top_thread.sh 36334 5 10
=== 2025-09-03 09:59:52 ===
進程 36334, 抓取前 5 個 lookup_slow 線程>>> 線程 TID=36343 內核調用棧 (/proc/36343/stack):
[<ffffffff816a1313>] lookup_slow+0x33/0xa7
[<ffffffff8120e9cf>] link_path_walk+0x80f/0x8b0
[<ffffffff8120ebcb>] path_lookupat+0x6b/0x7b0
[<ffffffff8120f33b>] filename_lookup+0x2b/0xc0
[<ffffffff81212e47>] user_path_at_empty+0x67/0xc0
[<ffffffff81212eb1>] user_path_at+0x11/0x20
[<ffffffff81206463>] vfs_fstatat+0x63/0xc0
[<ffffffff81206a84>] SYSC_newfstatat+0x24/0x60
[<ffffffff81206cce>] SyS_newfstatat+0xe/0x10
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 線程 TID=36343 strace 抓取 10s:
strace: Process 36343 attached with 586 threads
[pid 63559] futex(0xc015e21948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43629] futex(0xc01d484148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37798] futex(0xc017812148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37791] futex(0xc015a16148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37790] futex(0xc01570f148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37786] futex(0xc015492948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37783] futex(0xc01570e948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37779] futex(0xc015b12148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37758] futex(0xc01519a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37753] futex(0xc014f84148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37749] futex(0xc010d6c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37682] futex(0xc0167ec148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37679] futex(0xc015748148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37672] futex(0xc016f06148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37670] futex(0xc0166f4148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37667] futex(0xc016616148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37661] futex(0xc016704148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37660] futex(0xc016ab2148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37653] futex(0xc0168c4148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 線程 TID=36344 內核調用棧 (/proc/36344/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 線程 TID=36344 strace 抓取 10s:
strace: Process 36344 attached with 586 threads
[pid 200441] futex(0xc04501c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200440] futex(0xc04501c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200439] futex(0xc00e47d948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 87025] futex(0xc016f80948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85361] futex(0xc00d049948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85360] futex(0xc015e3c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 63559] futex(0xc015e21948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 45979] futex(0xc00e2a6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 44863] futex(0xc02f980148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43635] futex(0xc01bce6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43634] futex(0xc01dfea148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] futex(0xc028984148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43632] futex(0xc028540148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43631] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 43630] futex(0xc01c98a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43629] futex(0xc01d484148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38629] futex(0xc017472148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38628] futex(0xc01810c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37808] futex(0xc01532c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 線程 TID=36345 內核調用棧 (/proc/36345/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 線程 TID=36345 strace 抓取 10s:
strace: Process 36345 attached with 586 threads
[pid 200441] futex(0xc04501c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200440] futex(0xc04501c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200439] futex(0xc00e47d948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 87025] futex(0xc016f80948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85361] futex(0xc00d049948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85360] futex(0xc015e3c948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 63559] futex(0xc015e21948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 45979] futex(0xc00e2a6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 44863] futex(0xc02f980148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43635] futex(0xc01bce6148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43634] futex(0xc01dfea148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] futex(0xc028984148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43632] futex(0xc028540148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43631] futex(0xc01b9ee148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43630] futex(0xc01c98a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43629] futex(0xc01d484148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38629] futex(0xc017472148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38628] futex(0xc01810c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 37808] futex(0xc01532c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 線程 TID=36354 內核調用棧 (/proc/36354/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 線程 TID=36354 strace 抓取 10s:
strace: Process 36354 attached with 586 threads
[pid 200441] getsockname(910, <unfinished ...>
[pid 200440] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
[pid 200439] futex(0xc00e47dd28, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 87025] futex(0xc016f80948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 85361] openat(AT_FDCWD, "/proc/36334/fd", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 85360] write(864, "+\1\0\0\33\377\267\1\0\0\0\0\335\7\0\0\0\0\1\0\0\26\1\0\0\20hello\0\1\0\0\0\10helloOk\0\1\3topologyVersion\0-\0\0\0\7processId\0f\341M\f \357\266\300\241qi\244\22counter\0\23\0\0\0\0\0\0\0\0\22maxAwaitTimeMS\0\20'\0\0\0\0\0\0\3$clusterTime\0X\0\0\0\21clusterTime\0\2\0\0\0\217\241\267h\3signature\0003\0\0\0\5hash\0\24\0\0\0\0\230\36\262"..., 299 <unfinished ...>
[pid 63559] read(667, <unfinished ...>
[pid 45979] accept4(267, <unfinished ...>
[pid 44863] futex(0xc02f980148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] write(45, "{\"level\":\"debug\",\"time\":\"2025-09-03 10:01:52.105\",\"caller\":\"imp/imp_mysql.go:659\",\"msg\":\"sql: SELECT dls_task_status.user_id,dls_task_status.sid,\\n\\tdls_task_status.task_name,dls_task_status.status,dl"..., 550 <unfinished ...>
[pid 43632] read(556, <unfinished ...>
[pid 43630] futex(0xc01c98a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 38629] openat(AT_FDCWD, "/proc/36334/fd", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 38628] read(559, <unfinished ...>
[pid 37807] futex(0xc016bb6528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37806] futex(0xc011f74528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37805] read(157, <unfinished ...>
[pid 37797] write(860, "+\1\0\0\34\377\267\1\0\0\0\0\335\7\0\0\0\0\1\0\0\26\1\0\0\20hello\0\1\0\0\0\10helloOk\0\1\3topologyVersion\0-\0\0\0\7processId\0eU\232\350N5\317\224dy\7\23\22counter\0\35\0\0\0\0\0\0\0\0\22maxAwaitTimeMS\0\20'\0\0\0\0\0\0\3$clusterTime\0X\0\0\0\21clusterTime\0\2\0\0\0\217\241\267h\3signature\0003\0\0\0\5hash\0\24\0\0\0\0\230\36\262"..., 299 <unfinished ...>
[pid 37796] futex(0xc016584948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>>>> 線程 TID=36355 內核調用棧 (/proc/36355/stack):
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff>>> 線程 TID=36355 strace 抓取 10s:
strace: Process 36355 attached with 586 threads
[pid 200441] read(1128, <unfinished ...>
[pid 200440] futex(0xc04501c148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 200439] futex(0xc00e47dd28, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 85361] futex(0xc00d049948, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 45979] read(1405, <unfinished ...>
[pid 44863] read(1062, <unfinished ...>
[pid 43634] futex(0xc01dfea148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43633] write(996, "\1\0\0\0\1", 5 <unfinished ...>
[pid 43630] futex(0xc01c98a528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 43629] write(1053, "\1\0\0\0\1", 5 <unfinished ...>
[pid 38629] read(1406, <unfinished ...>
[pid 38628] write(949, "\1\0\0\0\1", 5 <unfinished ...>
[pid 37808] read(927, <unfinished ...>
[pid 37807] read(1252, <unfinished ...>
[pid 37806] futex(0xc011f74528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37805] epoll_pwait(4, <unfinished ...>
[pid 37804] futex(0xc01688e528, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 37798] epoll_pwait(4, <unfinished ...>
[pid 37797] epoll_pwait(4, <unfinished ...>
二、排查分析
2.1Load 的真正含義
- Linux 的 loadavg 表示:
- 可運行隊列(R 狀態)中的進程/線程數;
- 不可中斷等待(D 狀態,IO/鎖等待)中的進程/線程數。
- 關鍵認知:
- CPU 忙 → R 多 → Load 高;
- IO/鎖等待嚴重 → D 多 → Load 同樣高,即使 CPU 很閑。
補充定義(便于閱讀):
- R(Running/Runable):可運行隊列,等 CPU 或正在執行
- D(Uninterruptible sleep):不可中斷等待,多見于 IO 與鎖
- S(Sleeping):可中斷睡眠
- futex(Fast Userspace Mutex):用戶態互斥鎖,競爭/等待會進入內核等待點
2.2:確認異常進程
查看線程數:
ps -eLf | grep 36334 | wc -l
結果:
586
統計線程狀態:
ps -eLf | awk '{print $2,$3,$5}' | sort | uniq -c | grep 36334
輸出:
586 Sl
含義:backend
進程有 586 個線程,大多數處于輕度睡眠(Sl),并非瘋狂跑 CPU。
2.3:線程卡在哪(wchan)
查看線程等待點:
ps -eLo pid,tid,state,wchan:32,cmd | grep 36334 | awk '{print $4}' | sort | uniq -c | sort -nr | head
結果:
581 futex_wait_queue_me2 ep_poll1 hrtimer_nanosleep
👉 絕大多數線程卡在 futex_wait_queue_me
,典型的“等鎖”。
2.4:perf 采樣(用戶態/內核態熱點)
對進程進行熱點采樣:
perf top -p 36334
輸出熱點(片段):
55.83% backend [.] runtime.(*stkframe).getStackMap
15.95% [kernel] [k] down_read_trylock
15.95% [kernel] [k] osq_lock
12.22% backend [.] runtime.greyobject
分析:
- 用戶態熱點集中在 Go GC 路徑(
runtime.*
); - 內核態熱點集中在鎖(
down_read_trylock
/osq_lock
)。
若需要低風險取樣并生成火焰圖,見下文“Step 7:火焰圖分析”。
2.5:熱點文件路徑(誰在被頻繁訪問)
統計進程 FD 指向的熱點路徑:
ls -l /proc/36334/fd | awk '{print $NF}' | sort | uniq -c | sort -nr | head
結果(片段):
5129 config.toml
5113 ./ky_backend.log90 /train28/cog8/permanent/jingye1117 ./backend.log12 /opt/dls_cli/config_3000p
解讀:頻繁訪問配置文件與日志文件;部分路徑位于分布式存儲(/train28/...
),更易導致 IO 延遲抖動。
2.6:打開文件數(以 /proc 為準)
精確統計(基于 /proc
):
ls /proc/36334/fd | wc -l
輸出:
803
lsof
統計(會放大):
lsof | awk '{print $2}' | sort | uniq -c | sort -nr | head -5
輸出:
472667 36334
👉 lsof
默認按線程展開,存在重復統計。準確數字應以 /proc/<pid>/fd
為準。
2.7:火焰圖分析(生成 + 示意圖)
低風險采樣生成火焰圖:
# 1) 錄制 30 秒的采樣(49Hz,抓取調用棧)
perf record -F 49 -p 36334 -g -- sleep 30# 2) 將 perf 數據折疊為火焰圖輸入
perf script | ./FlameGraph/stackcollapse-perf.pl > out.folded# 3) 生成火焰圖 SVG
./FlameGraph/flamegraph.pl out.folded > flame.svg
權限與環境注意:
# 若提示權限不足,可臨時放寬(root 執行):
sysctl -w kernel.perf_event_paranoid=-1
sysctl -w kernel.kptr_restrict=0# 或通過 /proc 接口(臨時):
echo -1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
火焰圖 ASCII 示意(僅結構參考):
^ 棧深度(越上層越深)
|
| ██████████████ runtime.greyobject
| ████████ runtime.gcDrain
| ████ ████████ runtime.scanobject
| ████ ████ ████████ futex_wait_queue_me
| ████ ████ ████████ osq_lock
+----------------------------------------------> 采樣時間占比(越長越寬)結論:
- 用戶態:GC 相關路徑明顯(runtime.* 系列)。
- 內核態:futex/鎖等待顯著(`futex_wait_queue_me`、`osq_lock`)。
2.8:結論(本次事件)
backend
進程開啟 ~586 個線程;- 大量線程阻塞在
futex_wait_queue_me
(鎖等待); - 程序頻繁進行
openat/fstatat
(配置 + 日志 + 分布式存儲路徑); - 分布式存儲響應慢,線程堆積,GC 活躍加劇鎖競爭;
- 導致
load5
飆升至 500+,但 CPU idle 仍然很高。
2.9:優化建議(可落地)
-
減少頻繁文件 IO:
- 對配置文件做緩存(內存+變更訂閱或定期校驗),避免每次請求都
stat/open
; - 日志改為異步寫/批量 flush,控制日志級別與切分策略,避免熱日志放大;
- 合理合并小文件訪問,盡量減少路徑解析與權限校驗次數。
- 對配置文件做緩存(內存+變更訂閱或定期校驗),避免每次請求都
-
檢查與隔離存儲性能:
- 對分布式存儲掛載路徑做延遲/帶寬基線評估與報警(fio/oss 工具);
- 關鍵小文件下沉本地緩存(鏡像內置或啟動時下發),減少遠端抖動影響;
- IO 讀多寫少場景,優先本地讀,異步刷新遠端。
-
線程池與鎖競爭控制:
- 限制線程池上限與排隊策略,避免“線程海”在 IO 上等待;
- 細化臨界區,拆大鎖或使用 RWLock/無鎖結構,降低 futex 沖突;
- 避免在高并發路徑上持有全局共享資源(如全局 logger)的長臨界區。
-
Go GC 調優(針對 Go 服務):
- 打開 GC trace:
GODEBUG=gctrace=1 ./backend
- 評估
GOGC
(目標堆增長百分比),減少短生命周期對象分配,復用緩沖; - 避免在 GC 敏感區內進行大量阻塞 IO。
- 打開 GC trace:
A. 線程狀態與等待點巡檢(watch)
# check_thread.sh
#!/bin/bash
# 用法: ./check_thread.sh <pid> [interval]
# 默認間隔 1 秒,實時統計線程狀態與前10個 wchanPID=$1
INTERVAL=${2:-1}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [interval]"exit 1
fiwhile true; doclearecho "=== $(date '+%F %T') ==="echo "進程 $PID 的線程狀態統計:"ps -eLo pid,tid,stat,wchan:20,comm | awk -v pid=$PID '$1==pid {print $3}' | sort | uniq -c | sort -nrechoecho "前10個等待點 (wchan):"ps -eLo pid,tid,stat,wchan:20,comm | awk -v pid=$PID '$1==pid {print $4}' | sort | uniq -c | sort -nr | head -10sleep $INTERVAL
done
B. 確認 lookup_slow 與 fstatat 熱點路徑
# confirm_lookup_slow.sh
#!/bin/bash
# 用法: ./confirm_lookup_slow.sh <pid> [topN] [seconds]
# 例: ./confirm_lookup_slow.sh 36334 5 10PID=$1
TOPN=${2:-5}
DURATION=${3:-10}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [topN] [seconds]"exit 1
fiecho "=== $(date '+%F %T') ==="
echo "進程 $PID, 抓取前 $TOPN 個 lookup_slow 線程, 每個采樣 ${DURATION}s"# 找出 lookup_slow 的前 N 個線程
TIDS=$(ps -eLo pid,tid,wchan | awk -v pid=$PID '$1==pid && $3=="lookup_slow" {print $2}' | head -$TOPN)for tid in $TIDS; doechoecho ">>> TID=$tid 內核棧:"cat /proc/$tid/stackechoecho ">>> TID=$tid fstatat 調用路徑 (采樣 ${DURATION}s):"timeout $DURATION strace -p $tid -e trace=fstat,fstatat,newfstatat,fstat64 -s 200 2>&1 \| grep -E 'fstat|fstatat|newfstatat' \| grep -oE '\".*\"' \| tr -d '"' \| sort | uniq -c | sort -nr | head -20
done
C. 進程級路徑熱點統計(openat/fstatat)
# trace_lookup_paths.sh
#!/bin/bash
# 用法: ./trace_lookup_paths.sh <pid> [seconds]
# 用 strace 抓取 openat/fstatat 調用并統計最熱路徑PID=$1
DURATION=${2:-10}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [seconds]"exit 1
fiecho "=== $(date '+%F %T') ==="
echo "進程 $PID, 采樣 ${DURATION}s, 統計路徑熱點"# 只 trace 路徑相關 syscall,減少干擾
timeout $DURATION strace -e trace=openat,newfstatat,fstatat64 -s 200 -fp $PID 2>&1 \| grep -E 'openat|fstatat|newfstatat' \| awk -F\" '{print $2}' \| sort | uniq -c | sort -nr | head -20
D. 聚焦前 N 個 lookup_slow 線程(內核棧 + 快速 syscalls)
# trace_lookup_slow.sh
#!/bin/bash
# 用法: ./trace_lookup_slow.sh <pid> [topN] [trace_seconds]
# 例如: ./trace_lookup_slow.sh 36334 5 10PID=$1
TOPN=${2:-5}
TRACE_SEC=${3:-5}if [ -z "$PID" ]; thenecho "用法: $0 <pid> [topN] [trace_seconds]"exit 1
fiecho "=== $(date '+%F %T') ==="
echo "進程 $PID, 抓取前 $TOPN 個 lookup_slow 線程"# 找出 lookup_slow 的 TID
TIDS=$(ps -eLo pid,tid,stat,wchan:30,comm | \awk -v pid=$PID '$1==pid && $3 ~ /^D/ && $4=="lookup_slow" {print $2}' | \head -n $TOPN)if [ -z "$TIDS" ]; thenecho "沒有找到 lookup_slow 線程"exit 0
fifor tid in $TIDS; doechoecho ">>> 線程 TID=$tid 內核調用棧 (/proc/$tid/stack):"cat /proc/$tid/stack 2>/dev/null || echo " (無法讀取 stack,可能內核不支持)"echoecho ">>> 線程 TID=$tid strace 抓取 ${TRACE_SEC}s:"timeout $TRACE_SEC strace -s 200 -fp $tid 2>&1 | head -20
done
三、本次排查總結
- 本次問題根因:
backend
頻繁訪問配置/日志文件 + 部分 IO 落在分布式存儲導致延遲;配合 Go GC 活躍與鎖競爭,線程大量阻塞于futex_wait_queue_me
,最終引發 Load 飆升但 CPU 仍閑。 - 經驗要點:
- Load 高 ≠ CPU 忙;優先排查 IO/鎖等待;
- 先看線程狀態與 wchan 快速定向;
- 再用
perf/strace
與火焰圖定位熱點; - 以
/proc/<pid>/fd
獲取真實 FD 數; - 方案從“緩存/異步/隔離/限并發/調 GC”多層優化落地。
📌 經驗教訓:遇到 Load 高問題,不要只盯 CPU 使用率;更多時候是 IO + 鎖 在背后搗鬼。