一:背景
1. 講故事
前段時間有位朋友在微信上找到我,說他的程序出現了內存泄漏,能不能幫他看一下,這個問題還是比較經典的,加上好久沒上非托管方面的東西了,這篇就和大家分享一下,話不多說,上 WinDbg 說話。
二:WinDbg 分析
1. 到底是哪里的泄漏
好的開始就是成功的一半,否則就南轅北轍了,對吧,還是用經典的 !address -summary
看一下內存排布情況。
0:000>?!address?-summary---?Usage?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal
Heap???????????????????????????????????1935??????????553b3000?(???1.332?GB)??70.57%???66.59%
Image??????????????????????????????????1022???????????c306000?(?195.023?MB)??10.09%????9.52%
<unknown>??????????????????????????????1202???????????c09d000?(?192.613?MB)???9.97%????9.41%
Stack???????????????????????????????????541???????????b280000?(?178.500?MB)???9.24%????8.72%
Free???????????????????????????????????1158???????????73ab000?(?115.668?MB)????????????5.65%
TEB?????????????????????????????????????180????????????20f000?(???2.059?MB)???0.11%????0.10%
Other?????????????????????????????????????8?????????????5d000?(?372.000?kB)???0.02%????0.02%
PEB???????????????????????????????????????1??????????????3000?(??12.000?kB)???0.00%????0.00%---?State?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal
MEM_COMMIT?????????????????????????????3077??????????643c6000?(???1.566?GB)??83.00%???78.31%
MEM_RESERVE????????????????????????????1812??????????1487f000?(?328.496?MB)??17.00%???16.04%
MEM_FREE???????????????????????????????1158???????????73ab000?(?115.668?MB)????????????5.65%
從卦中可以看出,當前 MEM_COMMIT = 1.56 G
, 并且 Heap= 1.3 G
,既然超出了朋友的預期,很明顯這是一個非托管內存泄漏,既然 NTHeap 出現了泄漏,那就挖一下看看,使用 !heap -s
觀察一下各個heap句柄。
0:000>?!heap?-s************************************************************************************************************************NT?HEAP?STATS?BELOW
************************************************************************************************************************
LFH?Key???????????????????:?0xbb72f2a3
Termination?on?corruption?:?DISABLEDHeap?????Flags???Reserv??Commit??Virt???Free??List???UCR??Virt??Lock??Fast?(k)?????(k)????(k)?????(k)?length??????blocks?cont.?heap?
-----------------------------------------------------------------------------
00770000?00000002???16576???9716??16364?????33???195?????5????0??????0???LFH
006f0000?00001002????1292????148???1080?????11?????4?????2????0??????0???LFH
00a80000?00001002????3336???1972???3124?????88????25?????3????0??????0???LFH
02460000?00001002??????60??????4?????60??????0?????1?????1????0??????0??????
023b0000?00041002??????60??????4?????60??????2?????1?????1????0??????0??????
02450000?00001002?????272?????24?????60??????1?????3?????1????0??????0???LFH
04a40000?00041002????1292?????80???1080??????8?????4?????2????0??????0???LFH
06e90000?00001002???64180??56660??63968???1434???473?????9??624??????7???LFH
09dc0000?00001002??????60?????12?????60??????3?????2?????1????0??????0??????
0a500000?00001002????7428???3772???7216?????43????35?????4????0??????0???LFH
-----------------------------------------------------------------------------
從卦中的 Commit 列來看,內存占用都不大,最大的也不過 56M
,如果經驗豐富的話,你會發現 Virt blocks
高達 624
個,明白 ntheap 的朋友應該知道,凡是大于 512k
的 heapentry
都會單獨安排到 VirtualAllocdBlocks
數組中,可以用 dt ntdll!_HEAP 06e90000
給show出來。
0:000>?dt?ntdll!_HEAP?06e90000...+0x05c?VirtualMemoryThreshold?:?0xfe00+0x09c?VirtualAllocdBlocks?:?_LIST_ENTRY?[?0x6ea4000?-?0x7c0d0000?]...
為了更好的輸出 VirtualAllocdBlocks
數組,我們用 windbg
自帶的 heap 分析命令。
0:000>?!heap?06e90000?-m
Index???Address??Name??????Debugging?options?enabled8:???06e90000?Segment?at?06e90000?to?06e9f000?(0000f000?bytes?committed)Segment?at?078f0000?to?079ef000?(000ff000?bytes?committed)Segment?at?08870000?to?08a6f000?(001ff000?bytes?committed)Segment?at?0ec60000?to?0f05f000?(003f9000?bytes?committed)Segment?at?18660000?to?18e5f000?(007fa000?bytes?committed)Segment?at?26b20000?to?27aef000?(00fc0000?bytes?committed)Segment?at?45320000?to?462ef000?(00fcf000?bytes?committed)Segment?at?65bf0000?to?66bbf000?(008bf000?bytes?committed)Flags:????????????????00001002ForceFlags:???????????00000000Granularity:??????????8?bytesSegment?Reserve:??????03f70000Segment?Commit:???????00002000DeCommit?Block?Thres:?00000800DeCommit?Total?Thres:?00002000Total?Free?Size:??????0002cd56Max.?Allocation?Size:?7ffdefffLock?Variable?at:?????06e90258Next?TagIndex:????????0000Maximum?TagIndex:?????0000Tag?Entries:??????????00000000PsuedoTag?Entries:????00000000Virtual?Alloc?List:???06e9009c06ea4000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)070b2000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)079f4000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)07c0f000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0802b000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)08238000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)08444000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0865f000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0e20f000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0e42b000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0e635000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0e841000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0c661000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0c87e000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0ca8b000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0ea56000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0f062000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)0f275000:?00200000?[commited?201000,?unused?1000]?-?busy?(b)...
從卦中可以看到大量的 commited 201000, unused 1000
,這里的 0x201000
轉換一下大概就是 2M
,以經驗來說,這 2M 大概就是 pdf,image,bitmap 等這些玩意了,由于沒有開啟 pageheap 或 ust,沒法追蹤到底是什么東西分配的,到這里就沒法進展下去了。
2. 到底是誰分配的 2M 數據
首先能進入 VirtualAllocdBlocks
數組自然是高層調用了 HeapAlloc 這類API,同時這個數據量高度懷疑是 Bitmap
,Pdf
之類的大文件,很大可能是托管代碼做了什么導致這個資源沒有釋放,接下來使用 !dumpheap -stat
看下托管堆。
0:000>?!dumpheap?-stat
Statistics:MT????Count????TotalSize?Class?Name
...
09ae7e48??????627????????15048?System.Drawing.Bitmap
6b267040??????178???????366680?System.Decimal[]
6b2cb4a0?????1850???????601588?System.String[]
6b2cdd14?????1379???????638190?System.Byte[]
6b2cac14????15919??????1146764?System.String
09aec720????66332??????1326640?System.Drawing.FontFamily
09ae8590????66074??????2907256?System.Drawing.Font
Total?289300?objects
從卦中看,System.Drawing.Font
居然高達 6w
個,而且 System.Drawing.Bitmap
和 heap 上的 624
也非常接近,看樣子就是 Bitmap 啦,那為什么這個 Bitmap 沒有善終呢?可以用 !frq -stat
觀察下終結器隊列的 Freachable Queue
情況。
0:000>?!frq?-stat
Freachable?Queue:Count??????Total?Size???Type
---------------------------------------------------------152????????????3648???System.Data.Odbc.CNativeBuffer76????????????2128???System.Data.Odbc.OdbcConnectionHandle77????????????1540???System.Transactions.SafeIUnknown76????????????1824???System.Data.Odbc.OdbcStatementHandle2432??????????145920???System.Windows.Forms.Control+ControlNativeWindow304????????????7296???System.Drawing.Bitmap66062?????????2906728???System.Drawing.Font258????????????5160???System.Drawing.FontFamily308????????????9856???System.Drawing.Graphics308????????????3696???System.Windows.Forms.ImageList+NativeImageList1??????????????12???System.Drawing.Text.InstalledFontCollection12?????????????240???System.Threading.ThreadPoolWorkQueueThreadLocals1??????????????20???System.Security.Cryptography.SafeKeyHandle1??????????????20???Microsoft.Win32.SafeHandles.SafeWaitHandle6?????????????120???Microsoft.Win32.SafeHandles.SafeRegistryHandle12?????????????624???System.Threading.Thread1577???????????69388???System.Threading.ReaderWriterLock1??????????????20???System.Security.Cryptography.SafeProvHandle71,664?objects,?3,158,240?bytes
我去,這個可終結隊列居然高達 7.1w
,這是很有問題的,大概率當前的終結器線程瓦特了,接下來追查下 終結器線程
此時正在做什么 ?
0:000>?!t
ThreadCount:??????107
UnstartedThread:??0
BackgroundThread:?93
PendingThread:????0
DeadThread:???????12
Hosted?Runtime:???noLock??ID?OSID?ThreadOBJ????State?GC?Mode?????GC?Alloc?Context??Domain???Count?Apt?Exception0????1?138ac?0079fef0?????26020?Preemptive??00000000:00000000?00798f38?1?????STA?2????2?12b08?007adac0?????2b220?Preemptive??00000000:00000000?00798f38?0?????MTA?(Finalizer)?...0:000>?~2s
eax=00000001?ebx=00000000?ecx=00000000?edx=00000000?esi=00000001?edi=00000001
eip=777b2f8c?esp=0466eaf4?ebp=0466ec84?iopl=0?????????nv?up?ei?pl?nz?na?po?nc
cs=0023??ss=002b??ds=002b??es=002b??fs=0053??gs=002b?????????????efl=00000202
ntdll!NtWaitForMultipleObjects+0xc:
777b2f8c?c21400??????????ret?????14h
0:002>?k#?ChildEBP?RetAddr??????
00?0466ec84?762fc753?????ntdll!NtWaitForMultipleObjects+0xc
01?0466ec84?7695d9aa?????KERNELBASE!WaitForMultipleObjectsEx+0x103
02?0466ed34?7695c564?????combase!MTAThreadWaitForCall+0x1ca?[onecore\com\combase\dcomrem\channelb.cxx?@?7234]?
03?0466edc0?769a9923?????combase!MTAThreadDispatchCrossApartmentCall+0xf4?[onecore\com\combase\dcomrem\chancont.cxx?@?229]?
04?(Inline)?--------?????combase!CSyncClientCall::SwitchAptAndDispatchCall+0x9e4?[onecore\com\combase\dcomrem\channelb.cxx?@?5856]?
05?0466efa0?769ab739?????combase!CSyncClientCall::SendReceive2+0xad3?[onecore\com\combase\dcomrem\channelb.cxx?@?5459]?
06?(Inline)?--------?????combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x29?[onecore\com\combase\dcomrem\callctrl.cxx?@?1542]?
07?(Inline)?--------?????combase!CSyncClientCall::SendReceiveInRetryContext+0x29?[onecore\com\combase\dcomrem\callctrl.cxx?@?565]?
...
從上面的 MTAThreadDispatchCrossApartmentCall
可知,這又是一個經典的 COM
釋放問題導致終結器線程卡死。。。接下來仔細看下 線程列表的 STA 情況,可以發現有大量的線程是 STA 模式。

接下來就是將結果告訴朋友,為什么開的線程都是 STA 套件模式。
三:總結
總的來說,這次內存泄漏的原因在于朋友開了 STA 模式的線程,導致終結器線程卡死,進而導致 Bitmap 分配之后無法釋放,最終引發非托管泄漏。
這個dump告訴我們,不要放棄,一定可以在絕望中找到希望。