一:背景
1. 講故事
上一篇我們說到了 minhook 的一個簡單使用,這一篇給大家分享一個 minhook 在 dump 分析中的實戰,先看下面的線程棧。
0:044> ~~[138c]s
win32u!NtUserMessageCall+0x14:
00007ffc`5c891184 c3 ret
0:061> k# Child-SP RetAddr Call Site
00 0000008c`00ffec68 00007ffc`5f21bfbe win32u!NtUserMessageCall+0x14
01 0000008c`00ffec70 00007ffc`5f21be38 user32!SendMessageWorker+0x11e
02 0000008c`00ffed10 00007ffc`124fd4af user32!SendMessageW+0xf8
03 0000008c`00ffed70 00007ffc`125e943b cogxImagingDevice!DllUnregisterServer+0x3029f
04 0000008c`00ffeda0 00007ffc`125e9685 cogxImagingDevice!DllUnregisterServer+0x11c22b
05 0000008c`00ffede0 00007ffc`600b50e7 cogxImagingDevice!DllUnregisterServer+0x11c475
06 0000008c`00ffee20 00007ffc`60093ccd ntdll!LdrpCallInitRoutine+0x6f
07 0000008c`00ffee90 00007ffc`60092eef ntdll!LdrpProcessDetachNode+0xf5
08 0000008c`00ffef60 00007ffc`600ae319 ntdll!LdrpUnloadNode+0x3f
09 0000008c`00ffefb0 00007ffc`600ae293 ntdll!LdrpDecrementModuleLoadCountEx+0x71
0a 0000008c`00ffefe0 00007ffc`5cd7c00e ntdll!LdrUnloadDll+0x93
0b 0000008c`00fff010 00007ffc`5d47cf78 KERNELBASE!FreeLibrary+0x1e
0c 0000008c`00fff040 00007ffc`5d447aa3 combase!CClassCache::CDllPathEntry::CFinishObject::Finish+0x28 [onecore\com\combase\objact\dllcache.cxx @ 3420]
0d 0000008c`00fff070 00007ffc`5d4471a9 combase!CClassCache::CFinishComposite::Finish+0x4b [onecore\com\combase\objact\dllcache.cxx @ 3530]
0e 0000008c`00fff0a0 00007ffc`5d3f1499 combase!CClassCache::FreeUnused+0xdd [onecore\com\combase\objact\dllcache.cxx @ 6547]
0f 0000008c`00fff650 00007ffc`5d3f13c7 combase!CoFreeUnusedLibrariesEx+0x89 [onecore\com\combase\objact\dllapi.cxx @ 117]
10 (Inline Function) --------`-------- combase!CoFreeUnusedLibraries+0xa [onecore\com\combase\objact\dllapi.cxx @ 74]
11 0000008c`00fff690 00007ffc`6008a019 combase!CDllHost::MTADllUnloadCallback+0x17 [onecore\com\combase\objact\dllhost.cxx @ 929]
12 0000008c`00fff6c0 00007ffc`6008bec4 ntdll!TppTimerpExecuteCallback+0xa9
13 0000008c`00fff710 00007ffc`5f167e94 ntdll!TppWorkerThread+0x644
14 0000008c`00fffa00 00007ffc`600d7ad1 kernel32!BaseThreadInitThunk+0x14
這是一個 .NET某工控自動化控制系統
(https://www.cnblogs.com/huangxincheng/p/16544462.html) 的卡死故障,經過一頓分析之后,找到了最后的卡死原因,即 cogxImagingDevice.dll
中有一個 DllMain 的卸載通知,熟悉 win32 的朋友都知道,代碼經過 DllMain 的時候會持有一個 LdrpAcquireLoaderLock 進程加載鎖,在持鎖過程中它突然向一個窗體發送 SendMessageW 消息,可惜的是這個窗體沒有給予響應,一直卡死在這里,這就導致 進程加載鎖
遲遲得不到釋放,引發系統性卡死。。。
如果有朋友還是比較懵的話,我畫一張圖給大家看看,黑色加粗
就是問題的核心所在。
二:尋找解決方案
1. 現有困境
我可以通過 windbg 提取到 SendMessageW 方法的 窗口句柄 hWnd
,通過這個 hWnd 找到創建它的 processID 和 ThreadID,但問題是這兩個關鍵信息
是存放在當前機器的內核態中,言外之意就是用戶態dump沒有這兩個信息,所以關鍵信息的缺失導致無法有效的排查出問題。
解決辦法有兩個:
- 抓內核態dump:由于 win32u 模塊是閉源的,要想從內核態dump中找出還得不斷的參考 reactos,費時費力。
- SendMessageW跟蹤:這個相對來說輕量級,也是本篇重點說的,即 minhook。
2. 如何跟蹤 SendMessageW
我的想法是這樣的,對 SendMessageW 進行攔截來獲取 hWnd 參數,然后通過 hWnd 參數找到對應的 processid 和 threadid,然后再通過 processid 獲取 processname,有了這三個信息就可以讓對方無所遁形。
為了讓大家眼見為實,我們做一個例子,新建一個 WindowsProject1
的Win32窗體,在網關函數 WndProc 中故意讓程序卡死,參考代碼如下:
LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{if (message == WM_CLOSE) {Sleep(1000 * 1000);}// todo....return 0;
}
接下來新建一個 ConsoleApplication 控制臺程序,通過 SendMessage
給 WindowsProject1
打close消息,來演示無故卡死,完整的代碼如下:
using System;
using System.Runtime.InteropServices;
using System.Text;namespace ConsoleApplication
{public static class Program{private const uint WM_CLOSE = 0x0010;public static void Main(){// 安裝 HookHookManager.InstallHook();// 測試:發送 WM_CLOSE 消息(會觸發 Hook)IntPtr hWnd = FindWindow(null, "WindowsProject1");if (hWnd != IntPtr.Zero){SendMessage(hWnd, WM_CLOSE, IntPtr.Zero, IntPtr.Zero);Console.WriteLine("Sent WM_CLOSE to target window.");}else{Console.WriteLine("Target window not found.");}Console.ReadKey();// 卸載 HookHookManager.UninstallHook();}[DllImport("user32.dll", CharSet = CharSet.Unicode)]private static extern IntPtr FindWindow(string lpClassName, string lpWindowName);[DllImport("user32.dll", CharSet = CharSet.Unicode)]private static extern IntPtr SendMessage(IntPtr hWnd, uint Msg, IntPtr wParam, IntPtr lParam);}public static class HookManager{// SendMessageW 的原始函數簽名[UnmanagedFunctionPointer(CallingConvention.StdCall, CharSet = CharSet.Unicode)]private delegate IntPtr SendMessageWDelegate(IntPtr hWnd, uint Msg, IntPtr wParam, IntPtr lParam);private static SendMessageWDelegate _originalSendMessageW;private static IntPtr _sendMessageWPtr = IntPtr.Zero;public static void InstallHook(){// 1. 獲取 SendMessageW 的地址_sendMessageWPtr = MinHook.GetProcAddress(MinHook.GetModuleHandle("user32.dll"), "SendMessageW");if (_sendMessageWPtr == IntPtr.Zero){Console.WriteLine("Failed to find SendMessageW address.");return;}// 2. 初始化 MinHookvar status = MinHook.MH_Initialize();if (status != MinHook.MH_STATUS.MH_OK){Console.WriteLine($"MH_Initialize failed: {status}");return;}// 3. 創建 Hookvar detourPtr = Marshal.GetFunctionPointerForDelegate(new SendMessageWDelegate(HookedSendMessageW));status = MinHook.MH_CreateHook(_sendMessageWPtr, detourPtr, out var originalPtr);if (status != MinHook.MH_STATUS.MH_OK){Console.WriteLine($"MH_CreateHook failed: {status}");return;}_originalSendMessageW = Marshal.GetDelegateForFunctionPointer<SendMessageWDelegate>(originalPtr);// 4. 啟用 Hookstatus = MinHook.MH_EnableHook(_sendMessageWPtr);if (status != MinHook.MH_STATUS.MH_OK){Console.WriteLine($"MH_EnableHook failed: {status}");return;}Console.WriteLine("SendMessageW hook installed successfully!");}public static void UninstallHook(){if (_sendMessageWPtr == IntPtr.Zero)return;// 1. 禁用 Hookvar status = MinHook.MH_DisableHook(_sendMessageWPtr);if (status != MinHook.MH_STATUS.MH_OK)Console.WriteLine($"MH_DisableHook failed: {status}");// 2. 卸載 MinHookstatus = MinHook.MH_Uninitialize();if (status != MinHook.MH_STATUS.MH_OK)Console.WriteLine($"MH_Uninitialize failed: {status}");_sendMessageWPtr = IntPtr.Zero;Console.WriteLine("Hook uninstalled.");}private static IntPtr HookedSendMessageW(IntPtr hWnd, uint Msg, IntPtr wParam, IntPtr lParam){Console.WriteLine($"[HOOK] SendMessageW: hWnd=0x{hWnd.ToInt64():X}, Msg=0x{Msg:X}");// 獲取窗口所屬的線程和進程IDuint processId = 0;uint threadId = GetWindowThreadProcessId(hWnd, out processId);// 使用 System.Diagnostics.Process 獲取進程信息string processName = "Unknown";try{var targetProcess = System.Diagnostics.Process.GetProcessById((int)processId);processName = targetProcess.ProcessName;Console.WriteLine($"Window belongs to - ThreadID: {threadId}, ProcessID: {processId}, ProcessName: {processName}");}catch (Exception ex){Console.WriteLine(ex.Message);}// 調用原始函數return _originalSendMessageW(hWnd, Msg, wParam, lParam);}// 需要的Win32 API聲明[DllImport("user32.dll", SetLastError = true)]static extern uint GetWindowThreadProcessId(IntPtr hWnd, out uint processId);}public static class MinHook{public enum MH_STATUS{MH_OK = 0,MH_ERROR_ALREADY_INITIALIZED,MH_ERROR_NOT_INITIALIZED,// ... 其他狀態碼}[DllImport("MinHook.x86.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_Initialize();[DllImport("MinHook.x86.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_Uninitialize();[DllImport("MinHook.x86.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_CreateHook(IntPtr pTarget, IntPtr pDetour, out IntPtr ppOriginal);[DllImport("MinHook.x86.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_EnableHook(IntPtr pTarget);[DllImport("MinHook.x86.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_DisableHook(IntPtr pTarget);[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]public static extern IntPtr GetModuleHandle(string lpModuleName);[DllImport("kernel32.dll", CharSet = CharSet.Ansi)]public static extern IntPtr GetProcAddress(IntPtr hModule, string lpProcName);}
}
最核心的代碼是上面的 HookedSendMessageW
,大家可以多品鑒品鑒,接下來依次運行 WindowsProject1
和 ConsoleApplication
程序,輸出如下:
從輸出看,是不是一下子就把排查范圍
縮小了很多,最起碼我知道是一個叫 WindowsProject1
的進程壞了我的好事,后續就可以針對 WindowsProject1
深入探究為何方神物。。。
3. 還能更完美一點嗎
雖然排查范圍
極大的縮小了,還但是有一點不完美,如果這個窗口是本進程創建的還好,如果不是本進程創建的,最好能抓到對方進程的dump那就真完美了。。。
接下來的問題是怎么抓對方進程的dump呢?為了確保通用性,我建議在本進程中調 procdump 自動捕獲,參考代碼如下:
namespace ConsoleApplication
{public class DumpGen{// 生成進程 Dump 文件public static void GenerateProcessDump(int processId, string dumpPath){try{// ProcDump 命令行參數:// -mm: 生成 MiniDump// -accepteula: 自動接受許可協議(避免首次運行時彈出提示)string procDumpPath = $@"{Environment.CurrentDirectory}\procdump.exe";string arguments = $"-accepteula -mm {processId} \"{dumpPath}\"";var startInfo = new ProcessStartInfo{FileName = procDumpPath,Arguments = arguments,UseShellExecute = false,CreateNoWindow = true,RedirectStandardOutput = true,RedirectStandardError = true};using (var proc = new Process { StartInfo = startInfo }){proc.Start();proc.WaitForExit();Console.WriteLine("Dump captured successfully");}}catch (Exception ex){Console.WriteLine($"Failed to launch ProcDump: {ex.Message}");}}}
}
然后修改下 HookedSendMessageW 方法,如果 _originalSendMessageW 超時,將會自動抓取dump,當然這里只是一個簡單的演示,更復雜的邏輯大家可以根據自己的情況編寫,比如用一個 字典
來存放 hWnd
,然后根據超時時間自動的抓取進程的dump,參考代碼如下:
private static IntPtr HookedSendMessageW(IntPtr hWnd, uint Msg, IntPtr wParam, IntPtr lParam)
{Console.WriteLine($"[HOOK] SendMessageW: hWnd=0x{hWnd.ToInt64():X}, Msg=0x{Msg:X}");// 獲取窗口所屬的線程和進程IDuint processId = 0;uint threadId = GetWindowThreadProcessId(hWnd, out processId);// 使用 System.Diagnostics.Process 獲取進程信息string processName = "Unknown";try{var targetProcess = System.Diagnostics.Process.GetProcessById((int)processId);processName = targetProcess.ProcessName;Console.WriteLine($"Window belongs to - ThreadID: {threadId}, ProcessID: {processId}, ProcessName: {processName}");//定時檢測代碼:如果超時自動抓取dumpTask.Run(() =>{Thread.Sleep(3000);if (Msg == 0x0010){string dumpPath = Path.Combine(Environment.CurrentDirectory, $"ProcessDump_{processName}_{DateTime.Now:yyyyMMddHHmmss}.dmp");DumpGen.GenerateProcessDump(targetProcess.Id, dumpPath);Console.WriteLine($"Launching ProcDump to generate dump: {dumpPath}");}});}catch (Exception ex){Console.WriteLine(ex.Message);}// 調用原始函數return _originalSendMessageW(hWnd, Msg, wParam, lParam);
}
一切都搞定之后,運行下程序,截圖如下:
打開生成好的dump文件,找到目標線程,參考如下:
0:000> ~
. 0 Id: 4f34.338c Suspend: 0 Teb: 009c2000 Unfrozen1 Id: 4f34.6470 Suspend: 0 Teb: 009d2000 Unfrozen2 Id: 4f34.62a8 Suspend: 0 Teb: 009d6000 Unfrozen
0:000> ? 4f34 ; ? 338c; k
Evaluate expression: 20276 = 00004f34
Evaluate expression: 13196 = 0000338c# ChildEBP RetAddr
00 00b3f910 77a23999 ntdll!NtDelayExecution+0xc
01 00b3f930 776a8760 ntdll!RtlDelayExecution+0xe9
02 00b3f998 776a86ff KERNELBASE!SleepEx+0x50
03 00b3f9a8 00f81be3 KERNELBASE!Sleep+0xf
04 00b3fae8 76b36d13 WindowsProject1!WndProc+0x43 [D:\sources\woodpecker\ConsoleApplication\WindowsProject1\WindowsProject1.cpp @ 127]
05 00b3fb14 76b2540d user32!_InternalCallWinProc+0x2b
06 00b3fc18 76b24eb0 user32!UserCallWinProcCheckWow+0x49d
07 00b3fc7c 76b31709 user32!DispatchClientMessage+0x190
08 00b3fcb8 77a0bb66 user32!__fnDWORD+0x39
09 00b3fcf0 76b33ef0 ntdll!KiUserCallbackDispatcher+0x36
0a 00b3fd2c 00f81e9b user32!GetMessageW+0x30
0b 00b3fe44 00f8273d WindowsProject1!wWinMain+0xbb [D:\sources\woodpecker\ConsoleApplication\WindowsProject1\WindowsProject1.cpp @ 46]
0c 00b3fe64 00f8258a WindowsProject1!invoke_main+0x2d [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 123]
0d 00b3fec0 00f8241d WindowsProject1!__scrt_common_main_seh+0x15a [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
0e 00b3fec8 00f827b8 WindowsProject1!__scrt_common_main+0xd [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 331]
0f 00b3fed0 76705d49 WindowsProject1!wWinMainCRTStartup+0x8 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_wwinmain.cpp @ 17]
10 00b3fee0 779fcebb kernel32!BaseThreadInitThunk+0x19
11 00b3ff38 779fce41 ntdll!__RtlUserThreadStart+0x2b
12 00b3ff48 00000000 ntdll!_RtlUserThreadStart+0x1b
從卦中看,原來卡死是因為主線程正在 KERNELBASE!Sleep
,無語了,到此為止,這次卡死事故真相大白于天下。
三:總結
再回頭看文章開頭的 cogxImagingDevice.dll 導致的程序卡死,如果用本篇的解決方案,是不是非常的輕量級,從此以后再也不需要抓內核的dump,也不需要在客戶的電腦上用 spy++ 搗鼓來搗鼓去了。。。完美!